THE ANALYSIS OF PHYSICAL MEASUREMENTS ■ ■»: *m sen M ■1M m ■■■■ IHH ■ '0.' ni JV- yi i& ^^M 9W& by Pugh-Winslow -H ■1H 9n ■""■■■' &$JM m m H : . ' SS !■■■ ^^H 1)1% :^j$^ : . r^W^fX£SRQ<X]^W- flc!faft KB MOiXCVII EMEBSON M. PUGH Carnegie Institute of Technology GEORGE H. WINSLOW Argonne National Laboratory C. THE ANALYSIS OF PHYSICAL MEASUREMENTS A WW ADDISON-WESLEY READING^ MASSACHUSETTS • PALO ALTO ■ LONDON ■ DON MILLS, ONTARIO This book is in the ADDISON-WESLEY SERIES IN PHYSICS b C 7 t V °i °\ 1 PRESTON POLYTECHNIC 55029 Cio ? PuCr Copyright © 1966 by Addison -Wesley, All rights reserved. This book, or parts thereof, may not be reproduced in any form without the written permission of the publisher. Printed in the United States of America. Published simultaneously in the Dominion of Canada. Library of Congress Catalog Card No, 66-2I2G5 PREFACE Some thirty years ago at The Carnegie Institute of Technology one of us (GHW) was a student of the other (EMP) in a course entitled Physical Measurements. The recitation covered most of the topics in this book, and the laboratory experiments were carried on as research projects in which the students found out for themselves what measurements were needed to verify apparently simple fundamental principles. They also determined the accuracy of their verifications and explained any discrepancies. For example, the better students would find that Newton's second law could not be verified by a car being pulled along a track without determining and taking account of both the friction and the moments of inertia of the wheels. Even then the verification was only good to within the accuracy of the measurements. Some time later TCM1* collected his notes into an unpolished text which was produced only in lithographed form for use in the course. Meanwhile, GHW was finding that he had been taught much that many of his colleagues did not know but needed to know about the production of desired numerical results from raw experimental data. Thus the teacher had already invested time and effort in a piece of work that was, compar- atively speaking, lying idle, and the student had discovered concrete evi- dence of the value of that work. Collaboration in the production of a text suitable for publication seemed natural. Much rewriting and expansion to include topics beyond the level of sophomores majoring in physics was undertaken, but the early version remains as the hard core of our early chapters. We consider this book to lie not only a suitable text for undergraduates majoring in science or engineering but also a useful guide for professional experimentalists in the physical sciences. It starts on a very elementary level with discussions of the nature of the numbers one meets in measure- ments, of methods of numerical approximations, and of the use of graphs iv Pit E FACE and so on, so that the reader will pass four chapters before he comes to one entitled "Errors." The degree of complexity increases fairly steadily throughout the book, so that the reader will find Chapter 12 much more demanding. These demands arc only on his patience and facility, however; except for a knowledge of algebra, trigonometry, and elementary calculus, we have tried to make the hook mathematically complete. This has been done by the inclusion of appendixes in which that knowledge is used as a basis for the development of mathematical results needed in the body of the text. The nature of the hook is also indicated by its history. In particular, it is not a source book on the foundations of statistical theory. Although we have used the method of maximum likelihood, for instance, to justify the principle of the minimization of the squares of residuals as a device for obtaining the most probable values of parameters, we have not explored the method further. Although we have tried to make it clear that the values so obtained and especially the standard deviations computed for them can never be more than estimates, we have adopted the practice of referring simply to "standard deviation," or "the best estimate of the standard de- viation" as being less confusing than the greater particularity that would be desirable in a theoretical work. Our purpose is to emphasize objective,, practical, and appropriate numerical calculations rather than to present a study of the theory. Despite this disclaimer, the reader will find much of the book to be mathematical in nature. We hope to encourage the routine use of the sorts of calculations described here and we believe it necessary to this purpose that we gain the confidence of the reader by showing details of the processes by which various conclusions are reached. More than that, however, there are no recipes that will cover all contingencies. We hope that the inclusion of the mathematical detail will help the reader toward the application of the concepts to the development of the "recipes" he needs for each of his own particular problems. The reader will not find an enormous number of problems. We have in- cluded problems not so much for routine drill as for additional illustration and instruction. Many of them could have been used as examples in the text ; w T e have made them problems in the belief that the reader will develop a greater understanding of the concepts if he works them through himself. We are indebted to Blaisdcll Publishing Company, Waltham, Mass., for permission to reprint the integral of the normal error function from Joseph B. Rosenbach el al., Mathematical Tables p. 187 (1943). We are indebted to the Literary Executor of the late Sir Ronald A. Fisher, F.R.S., Cam- bridge, to Dr. Frank Yates, F.H.S., Rothanisted, and to Oliver & Boyd Ltd., Edinburgh, for their permission to reprint. Tables Ul and IV from their book, Statistical Melkwlsfor Research Workers (12th edition, 1954). PREFACE V We are also indebted to the Iowa State University Press for permission to reprint Table 10.5.3 from Statistical Met funis by George W. Sncdeeor (5th edition, 1956). Finally, we wish to acknowledge the patience and support of our wives, Ruth Pugh and Marty Winslow, the typing of early drafts by the latter, and tho typing of the entire final manuscript by Eleanor Schenck. Every author knows that if it were not for assistance from relatives, friends, and his publisher, who does much more than manufacture a book, the produc- tion of a book would be well-nigh impossible, Pittsburgh, Pennsylvania E.M*P. Argonne, Illinois G.H.W. April 1966 CONTENTS Chapter 1 Introduction 1-1 The analysis of observations 2 1-2 Recording 4 1-3 Estimation of tenths 4 Chapter 2 Accuracy 2—1 Precision vs. accuracy ........... 6 2—2 Impossibility of determining true values ..... 7 2-3 Decimal accuracy ............ 7 2^4 Relative accuracy 9 2-5 Types of numbers encountered 10 2-6 Significant figures . 10 2-7 Rounding off numbers 13 2-8 Large and small numbers , 14 Chapter 3 Approximations 3-1 Negligible magnitudes 16 3-2 Approximations with differential calculus ..... 18 Chapter 4 Graphic Analysis 4-1 Curve plotting 21 4-2 Curve fitting 23 4-3 Straight-line law 24 4-4 Nonlinear laws 27 Chapter 5 Errors 5-1 Systematic errors 31 5-2 Chance errors 32 vii viii CONTENTS Chapter 6 Probability 6-1 Elementary laws of chance: the analytical method . . 34 6-2 The experimental method .......... 35 6-3 Compound probabilities: the analytical method ... 36 6-4 Problem of n dice . 38 6-5 Permutations ami com bi nations 40 6-6 The binomial distribution 42 6-7 The Poissan distribution . . , 45 Chapter 7 Distribution of Chance Errors 7-1 Examples of error distributions 52 7-2 Characteristics of an error distribution. 55 Chapter 8 The Normal -Distribution Function 8-1 The normal-distribution function 60 8-2 Normal distribution: an illustrative application 6G 8-3 Poisson distribution for a large expectation value ... 68 Chapter 9 Measures of Spread 9-1 The standard deviation . 74 9-2 Probable error 76 9-3 Average deviation 77 9-4 Meaning of the measures of spread 77 9-5 An illustration , 80 Chapter 10 Method of Least Squares 10-1 Fundamental principle 82 10-2 An example of least squares with more than one unknown , ..... 84 10-3 General solution for linear unknowns 88 10-4 Least-squares fitting of a straight line 91 10-5 Observations of unequal weight , S0 10-6 Condition equations 98 10-7 Nonlinear observation equations ........ 100 10-8 Computation of the measures of spread 104 10-9 Treatment of identical readings , 107 10-10 The rejection of observations ......... 108 10-11 The range: a convenient estimator Ill Chapter 11 Propagation of Errors 11-1 General problem ..119 11-2 Special cases 122 11-3 Standard deviations of unknowns calculated from observation equations ........... 124 1 1 — =i An example 126 CONTEXTS lx 1 1 — J> Internal and external consistency methods of calculating standard deviations for one unknown , , . 127 11-6 Internal and external consistency methods for calculating standard deviations for more than one unknown , - 182 11-7 Rejection of observations: more than one unknown . . 136 Chapter 12 Introduction to Statistical Analysis 12-1 Extension to AT trials 145 12-2 Radioactive counting 146 12-3 Multinomial distribution 148 12-4 The range - 150 12-5 The X-square distribution: tests of fit 157 12-6 Student's t distribution: comparison of averages . . , 172 12-7 The ^-distribution: analysis of variance 177 12-8 The two-dimensional normal distribution ; the correlation coefficient 188 Appendixes and References 1 Normalization of the normal distribution ..... 207 2 Evaluation of the standard deviation for the normal distribution - 209 3 Sums of powers of integers; source of mathematical tools - • . 210 4 Weights for quantities that have been evaluated by the method of least squares . .212 5 A definite integral related to the spread between pairs of observations 214 6 Certain definite integrals 215 7 Multiple integration: Jacobians .217 8 Certain definite integrals of trigonometric functions . .221 9 Solution of simultaneous equations ....... 223 References 227 Tables A Integral of normal error function 1/V2jt ) e~ vV2 dy . . 230 B X 2 -values 232 C I test of significance between two sample means (x i and X2) ■ 234 D F test for equality of variances 236 Index 243 CHAPTER 1 INTRODUCTION The object of a study of physical measurements is to develop the thinking and reasoning powers, to furnish the special kinds of mental tools, and to develop the special kind of mental attitude that result from properly making and analyzing various kinds of measurements with particular regard to their precision and accuracy. The special attitude referred to is the full, operating realization that accuracy itself can be made a subject of measurement, that there are relative degrees of accuracy, that accuracy is important in one place and means only a waste of effort in another, that absolute accuracy is an impossibility, and that a measurement by itself is of much less value than when accompanied by a statement of its precision. By far the principal part of this book is devoted to the presentation of these mental tools and, indeed, to their presentation in ways which we hope will be conducive to the growth of the above-mentioned attitude. This leaves only a small part of the book focused more specifically on the mechanics of taking measurements, but it should be emphasized that, while the authors hope the book will find good use in the libraries of working scientists, its principal use is seen as a text in a course which includes laboratory work. When a course in physical measurements is given in conjunction with laboratory work or practical work in the physical sciences, its value is enhanced in that it gives the student a certain familiarity with apparatus and a facility in handling it. He acquires the habit of using it to the best advantage and of keeping it in such condition that it can be fully utilized when needed. Furthermore — and this is equally important — the laboratory work provides the student with measurements of his own on which to practice his course work, under conditions which are ideal for properly impressing him with the course material. While the authors have no evidence of the effectiveness of such a course without laboratory work, they do know that the course with laboratory, as taught for several years at the Carnegie Institute of Technology, was very effective. 1 2 INTRODUCTION [1-1 1-1 THE ANALYSIS OF OBSERVATIONS Modern science and technology are based on scientific experiments in- volving measurements. Carefully designed experiments carefully analyzed have produced a body of scientific facts that are not and cannot be ques- tioned. For example, the experimental evidence on which Isaac Newton (1643-1727) based his famous laws of motion and of universal gravitation are still valid. The results obtained from later experiments, which showed that Newton's laws are far more accurate than the original evidence on which they were based, are also accepted by modern scientists. Early in the twentieth century new data obtained with even more accurate measure- ments had become available. These new measurements led Albert Einstein (1879-1955) to his general theory of relativity, which has superseded Newton's laws. It is important to note that the measurements which established Newton's laws are still valid. In fact, they helped to establish Einstein's theory, for which Newton's laws are an excellent approximation at the velocities involved in those measurements. Many similar examples can be found in the history of science. While science does contain a large body of unquestionable facts, there are many phenomena in nature which are not well understood primarily because of the lack of sufficiently well-designed experiments with properly analyzed results. Many concepts and quantities can be established by two or more very different types of investigation. The regularity of nature is such that, when two different scientists carry out different types of experiments to establish a given concept, they must arrive at the same conclusion provided that they are both sufficiently competent scientists and each carries out a sufficiently careful analysis of his data. Nevertheless and unfortunately, scientific journals and books contain many instances where one statement concerning a given concept conflicts with another. Such difficulties emphasize the importance of the phrases "sufficiently competent scientists " and "sufficiently careful analysis. " Careful analysis sometimes reveals that, although two types of experiments were designed to measure the same quantity, they do in fact measure different quantities. Conflicting statements in the scientific literature are frequently due to a not unnatural desire on the part of some beginning scientists to make unusual discoveries early in their lives. When such scientists obtain un- expected experimental data they may seize upon unusual explanations when well-established principles are sufficient for their analyses. Such scientists should remember that important new principles are rarely dis- covered in this manner. New principles have usually been discovered while carrying out well-planned research programs on fairly well-known phenomena. Wilhelm K. Rontgen (1845-1923) discovered x-rays during his well-designed attempts to improve on the cathode ray experiments 1-1] THE ANALYSIS OF OBSERVATIONS 3 of Lenard and William Crookes (1832-1919). Max Planck (1858-1947) originated the quantum theory while he was attempting to improve on J. W. S. Rayleigh's (1842-1919) theory for blackbody radiation. Planck modified Rayleigh's treatment by expressing it in terms of small units of energy. He was amazed to discover that his theory agreed with the experi- ment when and only when he did not pass to the limit of making these units of energy infinitesimal. Since the concept of energy radiation only by finite amounts, or quanta, was so radical, Planck spent nearly ten years attempting to solve the problem without this radical assumption. These ten years of serious but unsuccessful attempts to use previously well- established principles provided one of the strongest arguments for the acceptance of the new quantum theory. Our purpose is to present and discuss modern methods for obtaining reliable experimental results. The first basic step is to distinguish between errors and mistakes. Mistakes generally are due to some form of careless- ness on the part of the experimenter. Errors are inherent in the measuring techniques and require special methods for their elimination or reduction. While there are many types of errors, they are usually divided into two main classifications: systematic errors and chance errors. Systematic errors are due to definite discoverable phenomena. The period of a simple pendulum depends on the length of the pendulum and the gravitational acceleration. Such a system is frequently used to measure the latter quantity. If the experimenter uses a cord and measures its length before it has been stretched by the weight of the bob, he will be making a systematic error. This error can be eliminated by measuring the length of the cord after the pendulum is hung, or the data could be cor- rected for it by estimating the change in length from the weight of the bob and the elastic constants of the cord.* When all of the known systematic errors have been eliminated or cor- rected for, there usually remains in the data a scatter which is produced by unknown causes. To continue with the above example, air currents may affect the motion or one may not always judge the end of a swing in the same way, and so on. In length measurements that are not expected to be extremely precise one frequently judges coincidence by touch, and touches will vary. As stated above, the exact causes are unknown, though they are usually judged to be of the sort just described. A system of statistical methods has been developed for dealing with these chance errors. Properly used, these methods provide powerful tools for analyzing data. They are ideally suited to modern computer tech- * It would seem obvious that in this example it is simpler to eliminate the error than to correct for it. 4 INTRODUCTION [1-3 niques, but they are also very useful where only desk calculators are available. The fact that computing machines of all kinds are so useful with these statistical methods has led to some misuse of the latter. It is often easier to put the data on a machine than it is to thoroughly analyze the experi- ment for systematic errors. Such misuse of the statistical methods has led some scientists to discard all statistical treatments. This is very un- fortunate because properly used statistical methods can improve the analysis of any physical experiment. 1-2 RECORDING The remainder of this chapter will be used for some introductory material to proper laboratory operations. Formal data sheets or notebooks should be used to record all observa- tions, numerical or otherwise. It should list important instruments used. Under no circumstances should observations be recorded elsewhere and none should be erased. The data sheet is the "notebook of the scientist," which, like that of an accountant, should be a book of "original entry" for reasons that are as important on the one hand for scientific investigation of natural phenomena as, on the other, for legal investigation of indebted- ness. If there is reason for doubting the value of any entry, or even if it is obviously wrong, it may be canceled by drawing a line through it ; but this should be done in such a way as not to obscure what is written but to permit it to be utilized later if it is found desirable to do so. The advantages of keeping full, complete records cannot be too strongly emphasized. We might almost say that one should write it down every time he turns around. It is true that most entries of this kind will never be referred to again, but that is sufficiently compensated for by finding a much desired record of some odd little fact, many months after the event, that a less methodical person would not have bothered to note down. 1-3 ESTIMATION OF TENTHS It sometimes happens that a measurement requires only a slight degree of accuracy, and time and trouble can be saved by making it only roughly. As a general principle, however, it is advantageous to make all measure- ments as accurately as possible. Thus measurements of lengths made with the meter stick should be expressed not merely to the nearest millimeter but to the nearest tenth of a millimeter. It is not difficult to imagine each of the millimeter intervals of the scale divided into tenths and to make a fairly good estimate of the number of tenths included in the length to be measured. When experienced observers make mental subdivisions into 1-3] ESTIMATION OF TENTHS 5 tenths of the smallest intervals on a graduated scale their estimates seldom differ from one another but for the beginner the estimation of tenths may be an uncertain process. To gain proficiency it is better to begin with larger subdivisions, such as a scale of centimeters that has no millimeter marks. The position of a mark placed at random on such a scale can be estimated mentally and the accuracy of such a determination tested by actual measurement with a more finely divided scale. CHAPTER 2 ACCURACY It has become reasonably common practice to distinguish between precision and accuracy. This convention will be adopted in this book. 2-1 PRECISION VS. ACCURACY The word "precision" will be related to the random error distribution associated with a particular experiment or even with a particular type of experiment. The word "accuracy" shall be related to the existence of systematic errors — differences between laboratories, for instance. For example, one could perform very precise but inaccurate timing with a high-quality pendulum clock that had the pendulum set at not quite the right length. As another example, two different laboratories could make comparisons of masses for which each uses a beam balance of the same model made by the same manufacturer. The precision at the laboratories would very probably be the same, as nearly as one could tell, but it is also probable that there would be a difference in the ratio of the arms on the two balances that could be demonstrated by proper procedures. One of these balances would certainly have a ratio nearer unity than the other, and unless appropriate corrections are made on the latter, the results obtained with it would be less accurate. Finally, it should be emphasized that references to precision or accuracy are pertinent to the absolute size of the error involved and not to the fractional error. Confusion on this point leads to confusion in the concept of the weight of an observation; i.e., that some observations should be given greater consideration than others. Thus a pressure of 0.5 atmo- sphere (atm) known to 0.1 atm is known more precisely than is a pressure of 10 atm known to 0.5 atm. On the other hand, the logarithm of the second pressure is more precise than is the logarithm of the first. 6 2-3] DECIMAL ACCURACY 7 Many of the remarks to be made apply equally to precision and accuracy. Where this is the case, no attempt to distinguish between them will be made. 2-2 IMPOSSIBILITY OF DETERMINING TRUE VALUES Beginning students in technical courses have some difficulty in grasping the idea that accuracy is always a relative matter and that absolute preci- sion of measurement is an impossibility, because they have had little practice in careful measurement and their previous study of arithmetic has emphasized infinite accuracy in numerical values. Such a number as 12.8 has been supposed not only to mean the same thing as 12.80 but also to be equal to 12.800000 . . . , to an unlimited number of decimal places. This is quite proper and satisfactory so long as one realizes that he is dealing with ideal quantities: perfections of measurement which have no more reality of existence than the point, line, plane, or cube of geometry. The smoothest surface of a table does not come as near to being a plane as does the surface of an "optically worked" block of glass or a "Whitworth plane, " and even the smoothest possible surface can be magnified to show that it contains irregularities everywhere. If it were magnified enough, we could see that its shape would not even remain constant, but individual mole- cules would be found swinging back and forth or escaping from the surface. A geometrical plane certainly corresponds to nothing in reality and, in general, perfect accuracy of number is just as much an imaginary concept. 2-3 DECIMAL ACCURACY If 12.8 cm, as a measurement, does not mean the ideal number 12.800000000 ... to an infinite number of decimal places, what does it mean? Since different measurements are likely to be made with different degrees of accuracy, the universally adopted convention is merely the common-sense one that the statement of a measurement must be accurate as far as it goes; and it should go far enough to express the accuracy of the determination. Thus "12.8 cm" means a length that is nearer to precisely 12.80 . . . than to precisely 12.70 or 12.90 cm; i.e., that its "rounded-off" value would be 12.8 cm, not 12.7 or 12.9. If a length is written "12.80 cm, " however, the implication is that the stated measurement is nearer to this same precise 12.800000 . . . than it is to either 12.79 or 12.81 cm. In other words, it is implied that it has been measured to hundredths of a centimeter and found to be between 12.795 and 12.805 cm, so that it can properly be rounded off to 12.80. The previous description, "12.8 cm, " means that the measurement is between 12.75 and 12.85; it says nothing about hundredths of a centimeter, and can correctly represent any length between the limits 8 ACCURACY [2-3 just given, for example 12.75, 12.76, 12.77, 12.78, 12.79, 12.81,12.82, 12.83, or 12.84,* for each one of these could be rounded off to 12.8. To write the length 12.8 cm in the form "12.80 cm" would be to violate the rule that a statement should be accurate as far as it goes, for it would go as far as hundredths, and the chances are ten to one that it would be one of the other numbers of hundredths given above. On the other hand, if an ob- server determined a length to be 12.80 cm, that is, if he looked for hundredths and established the fact that there were none of them, then to state the result only as 12.8 cm would not be doing justice to his own measurement, for he would imply that the correct number of tenths was merely known to be nearer 8 than 7, namely greater than 7.5, whereas he had actually found it to be nearer 8 than 7.9 — that is, greater than 7.95. When a carpenter says "just 8 in. " he probably means "nearer to 8 and § than to 8 and | or 7 and | in., " yg- in. one way or the other usually being unimportant. When a modern machinist says "just 8 in.," he may mean "nearer to 8.000 than to 7.999 or to 8.001, " y^o in - usually being negligible to him. When another person says "just 8 in.," we must know what kinds of material he works with before we can tell the meaning of his word "just." If decimal subdivisions were used everywhere, the carpenter's 8 in. would mean 8.0, while the machinist's would mean 8.000; for one man "8" would mean "between 7.950 and 8.050," while for the other it would mean "between 7.9995 and 8.0005. " It is to avoid such ambiguities that scientists have adopted the rule that "8" means "between 7.5 and 8.5;" "8.0" means "between 7.95 and 8.05," and "8.00" means "between 7.995 and 8.005." In other words, no more figures should be written down than are thought to be correct, and no figures that are thought to be correct should be omitted. In experimental measurements the last figure or even the last two figures of an observation may be uncertain. These uncertain figures should be written down anyway, since the average of a number of such observations is more certain than any single one. The safest rule is to retain one or at most two doubtful figures in an observation and only one doubtful figure in a final result of the experiment. There is not much danger of anyone's "rounding off" a carefully obtained measurement like 2.836 g to 2.84 g merely for the sake of doing some rounding off. There is a very decided danger, however, of forgetting to write down a final significant zero; if two lengths are 147 mm and 160 mm the tendency when writing them in centimeters is to put down 14.7 and 16. If the is as important as the 7 when writing millimeters, the same is equally true when writing centimeters. * For a discussion of the convention by which 12.75 < 12.8 < 12.84, see Section 2-7. 2-4] RELATIVE ACCURACY 9 2-4 RELATIVE ACCURACY Draw two rectangles of dimensions 20.0 cm X 20.5 cm and 2.0 cm X 2.5 cm, which are approximate squares. The difference between the height and the width of the larger one is just the same as the difference between the height and the width of the smaller, yet the small rectangle is obviously a less accurate approximation to the shape of a perfect square than is the large one. This may serve as an illustration of the fact that the main interest may sometimes be in the relative accuracy rather than in the absolute accuracy. A difference of 0.5 cm has the same absolute value wherever it occurs, but it is a con- siderable part of a 2-cm length while it is relatively insignificant in com- parison with a 20-cm length. Accordingly the relative accuracy of a measurement depends on two things: the absolute difference and the size of the measurement itself. If two points on the earth's surface are found by careful survey to be 10 miles (mi) apart, the determination of distance may easily be in error by more than 1 ft, and even with the most careful triangulation the error is likely to be as much as 4 in. However, an error of 5 in. in measuring the thickness of a door could hardly be made even with the clumsiest measur- ing device. Nevertheless, it would be misleading to say that the "clumsy" measurement should be considered necessarily less accurate than the careful one; 5 in. is less than 4 in., and one's attitude should reflect the way the results are to be used. It is true that it is often important to consider what fraction of the total measurement the error amounts to. Suppose the thickness of the door is If in. How large a part of this measurement is the error of 5 in. ? Obviously it is one-sixth of the total, or an error of more than 16%, while 4 in. out of 10 mi is roughly an error of 1 out of 150,000, or about 0.0006 of 1%. The relative error of a measurement usually does not need to be calcu- lated with great care. Where numbers are as different as 0.0006% (10-mi survey above) and 16% (thickness of door) the location of the decimal point is more important than the size of the significant figures in either case; to call the former number "a few ten-thousandths of one percent" and the latter "some 10 or 20 percent" gives the important information needed. This means that calculations of relative error seldom need to be done on paper but can be worked out mentally. Frequently in reporting results it is desirable to state the percentage of difference between two numbers, and an ambiguity arises as to which of the two numbers should be divided into their difference. The following rules are generally agreed upon. The relative difference between 3.11 and tv (=3.14) is 3x4, and the relative difference between 3.17 and tt is likewise 3Y4, not 3-fy; i.e., the numerical error is to be divided by a true or theoretical value rather than 10 ACCURACY [2-6 by an experimental or erroneous value. It is often desirable, however, to compare two values which are equally good according to one's available knowledge of them. When there is no standard and no reason for choosing one of the measurements over the other, the accepted procedure is to divide the difference by the greater value. For example, the numbers 4 and 5 would be said to differ from each other by 20%, not 25%, for the dif- ference divided by the greater number is one-fifth, not one-fourth. 2-5 TYPES OF NUMBERS ENCOUNTERED In technical and scientific literature the numbers encountered can usually be recognized to be of three different types with respect to accuracy: 1. Numbers obtained from experimental data, which were discussed in the preceding paragraphs, especially Section 2-3. They are written with as many digits as are justified by the accuracy of their determination. 2. Exact numbers, such as the ^ in the kinetic-energy formula, \ mv 2 , or as the 4 in the formula for the area of a sphere, 4irr 2 . Such numbers are always written as shown here but are treated in calculations like 0.50000 . . . and 4.00000 . . . , respectively, each with as many decimal places as are required for the particular calculation involved. The number ir also falls in this classification. Exact numbers are easily recognized so that writing them as ^, 4, and ir causes no confusion and avoids the awkwardness that would result from writing them as 0.50000 . . . , 4.00000 . . . , and 3.14159 . . . 3. Illustrative numbers such as those found in textbook problems. These numbers generally come from imaginary experiments and are usually chosen as simple integers to simplify the calculations. For example, "A sled slides 15 ft from rest in 3 sec. What is its acceleration?" or "The legs of a right triangle are 2 and 3 ft long. What is the length of the hypot- enuse?" From the answers given for such problems, one usually finds that the author desires the student to assume that these numbers are respectively 15.0 ft and 3.00 sec, and 2.00 ft and 3.00 ft. In cases like this one should treat the numbers given as if they were of reasonable accuracy — usually an accuracy that is required to give three significant figures in the answer if no other information is available, or an accuracy that is involved in the careful use of a ten-inch slide rule. The use of this kind of number in illustrative problems is justified by the fact that it simplifies calculations and thus makes the learning of fundamental principles more important than doing arithmetic. 2-6 SIGNIFICANT FIGURES At first glance, one might think the discussions in this section and in many succeeding sections outmoded for those who have access to auto- matic digital computers. Personal experience, however, has shown that 2-6] SIGNIFICANT FIGURES 11 Table 2-1 Number of significant figures 2 3 4 Numbers having the indicated number of significant figures 23. 0.43 0.0078 63 X 10 3 23.2 0.432 0.00780 63.0 X 10 3 23.20 0.4321 0.007806 63.04 X 10 3 this is not the case. The material in this book makes a good foundation on which to build the more advanced methods of numerical analysis needed for proper use of the larger machines and they can also be applied im- mediately to the use of the smaller computers and desk calculators available to almost every scientist and engineer. When one wishes to determine the accuracy to be expected from a calculation in which experimental quantities are involved, the concept of significant figures is simple and very useful. This concept is further helpful in eliminating much useless calculation that might be thought necessary otherwise. The concept is best explained by examples. Each entry in Table 2-1 has the number of significant figures indicated at the top of the column. A simple rule for significant figures in division, multiplication, raising to powers,* and extracting roots is as follows: Retain the same number of significant figures or at most one more in the ansiver as is contained in the component quantity of the least relative accuracy, i.e., that has the smallest number of significant figures. In general, time will be wasted in calculations of this kind if any quantity used in these calculations contains a number of significant figures greater than one more than the number of significant figures contained in the least accurate quantity in the calculation. This rule is useful because in the operations mentioned the answer will have approximately the same percentage of "uncertainty" as the quantity in the calculation that has the largest percentage of "uncertainty," and because the number of significant figures in a quantity is a rough measure of its percentage of accuracy. The rule suggests retaining one extra significant figure as a safety measure, since the significant-figure concept is not accurate enough for all our purposes. This will be seen from the following examples: Divide 99.8 by 9.94. Considering the "uncertainty" in a number to be one unit in the last significant figure, the number 99.8 should be between 99.75 and 99.85, which is an uncertainty of 0.1. The result, then, should be 10.04, * The simple rules given here for significant figures apply less accurately to the operation of raising to powers than they do to the operations of multiplication and division. This follows from the fact that if a quantity is raised to a power n the actual percentage of uncertainty in the result is n times as great as that of the original quantity. 12 ACCURACY [2-6 not 10.0, since the uncertainties in 99.8, 9.94, and 10.04 are all about 0,1%, whereas the uncertainty in 10.0 is 1%. As another example, note that 11.1, 23.2, 45.1, 72.8, and 98.2 all have three significant figures, but their un- certainties are approximately 1, 0.4, 0.2, 0.13, and 0.1% respectively. This last example shows that for quantities which all have the same number of significant figures, those quantities having the smallest first digit have the largest percentage of uncertainty. Numbers that by these rules are found to have too many significant figures should be rounded off until they contain the correct number of figures. If the number being rounded off has a smaller first digit than the least accurate number, it should be left with one more significant figure than the least accurate number. The rules just given for the use of significant figures do not apply to addition and subtraction. When numbers are to be added or subtracted, they should be rounded off until they contain no more decimal places than that number in the group that contains the smallest number of decimal places. For example, if 2.823 is to be added to 586.3, it should be written 2.8 and the sum written as 589.1. The sum is not 589.123, since such a result would imply that 586.3 means 586.300, whereas it means only some number be- tween 586.25 and 586.35. When calculations must be made with complex formulas, care must be taken not to round off the figures too much or too soon. One must be especially careful when any quantity appears in more than one place in the complex formula. The following problem found in a number of elementary texts illustrates the principle. A simple pendulum has a period of 2 sec. If an increase in temperature makes the pendulum 1 mm longer, how many seconds would it lose in a day? If the statement that the period is 2 sec means that the period is some- where between 1.5 and 2.5 sec, then the problem has no meaningful solu- tion, since the uncertainty will be so much greater than the change pro- duced by the increased length. Obviously the author did not state his problem accurately according to the principles set forth here and, therefore, the student must make assumptions as to what the author meant. It appears reasonable to assume that he meant the original pendulum would make just 86400/2.00000 = 43200 complete swings per day, and therefore, the period must be accurate to five or six significant figures. The period of one complete swing is given by T = 2irVL/g. Hence = gVSMOaf _JL £DM = 993U cm . 47T 2 tt 2, 9.8696 2-7] ROUNDING OFF NUMBERS 13 Assume that the increase in length is just 1.00 mm; then L 2 = 99.411 cm and T 2 = 2tt\/99.41 1/980. 16 = 2.00101 sec. The lengthened pendulum will lose (2.00101 - 2.00000) 43200 = 43.6 sec per day. It is obvious that one must carry five or six significant figures in these calculations to obtain two or three significant figures in the answer. In cases of this kind it is usually possible to derive an approximate expression that can be calculated with fewer significant figures and at the same time retain the same accuracy. This will be illustrated here; general methods will be discussed in the next chapter. In the present case, the equation T x = 2ttVXi7<7 can be divided into the equation T 2 — 2ic\/L 2 /g giving T 2 /T x = \/L 2 /L x . Set L 2 = L x + I, T 2 = T x Jl + — = T x (l + — 2 + ■ • V V L x \ 2L X 8L? / This last expression is an expansion by the binomial theorem in which all terms from the third on in the parentheses are very small. Hence r 2 = r,(i + ^) or Now T x = 2.00000 sec, I = 0.100 cm, and L x = 99.311 cm. A simple slide rule calculation shows that (T 2 — T x ) 43200 = 43.6 sec. 2-7 BOUNDING OFF NUMBERS When a number is rounded off the last digit retained should be increased by one whenever the adjacent digit being discarded is greater than 5, or if it is 5 and there are digits other than zero to its right. If the discarded digit is just 5, and there are no known digits to its right, or there are only zeros, the retained digit should be increased by one, if such increase makes it an even number. If the retained digit, in this case, is already even, do not alter it. This last rule is purely arbitrary, but it does reduce the danger of cumulative errors that might be introduced by a habit of always increasing or never increasing the last retained digit by one when the dis- carded digit is just 5. Of course, this same result would be produced by substituting "odd number" for "even number" in the rule, but the use of 14 ACCURACY [2-8 "even number" possesses the slight advantage of avoiding the necessity of making the decision a second time if the number is later to be divided by two. For illustration, the following numbers have been properly rounded off to three significant figures: 12.346 12.3 12.350 12.4 12.250 12.2 12.251 12.3 12.351 12.4 12.349 12.3 There are times, however, when numbers should not be rounded off while making the calculations. In solving simultaneous equations it is necessary to assume that the numerical coefficients are known precisely, just like the \ in the expression %mv 2 . Thus during the solution, numbers must be used that have many more digits than those in the coefficients. The reason can be seen by setting up for such a problem a solution by determinants. We see that differences between numbers which are very close to each other are often needed. These numbers are products of two of the pieces of input information and it can turn out that they are identical, or nearly so, to the percentage of accuracy, or number of significant figures, in one of the pieces of input information. Were these intermediate numbers rounded off before taking the difference, this difference could become zero. Usually the difference between the unrounded numbers will be closer to the input information in numbers of significant figures. The number of figures kept in the results should also conform to the input information. 2-8 LARGE AND SMALL NUMBERS To avoid a long string of figures when writing very large or very small numbers it is customary to divide a number into two factors, one of them being a power of ten. Thus 0.000562 and 23,600,000 are the same as 562 X 10 -6 and 23.6 X 10 6 respectively. This notation also makes it possible to write 93,000,000 unequivocally with whatever number of signi- ficant figures is desired; it can be put in the form 93 X 10 6 or 93.0 X 10 6 or 93.00 X 10 6 , etc. The same value and accuracy for 93.00 X 10 6 would be retained just as well by writing 9.300 X 10 7 or 930.0 X 10 5 , but the Committee of the American Physics Teachers that studied this matter recommended that, wherever feasible, the power of ten chosen should be three or a multiple of three. If all numbers are written in this way, com- parison of different quantities will be greatly facilitated. PROBLEMS 15 PROBLEMS 1. Suppose you are instructed to draw a line "just ten" centimeters long, using an ordinary centimeter scale. Will its length be best expressed as 10 cm, 10.0 cm, 10.00 cm, or 10.000 cm? Answer: 10.0 cm 2. The density of water in g/cc is, for the temperature given: 0°C 0.999841 4°C 0.999973 10°C 0.999700 20°C 0.998203 30°C 0.995646 (a) Does water at 4°C have a density of 1.00, 1.000, 1.0000, or 1.00000? (b) Correct the following statement by crossing off only the unjustifiable figures: "At ordinary room temperatures water has a density of 1.00000." Answer: (a) 1.0000 (b) 1.00 3. The number 7r is 3.14159265358 . . . Find the percent error in the approxi- mations: (a) - 2 ^; (b) f^§ for if, (c) 10 for 7T 2 . As described in Item 2 of Section 2-5, 22, 7, 355, etc. are assumed to be exact. Answer: (a) 0.040% (b) 8.5 X 10- 6 % (c) 1.32% 4. The number e is the limit of the sum of the infinite series 1 + l! + 2! + 3! + 4! + ""' which is 2.7182818 . . . What percent error is made by taking (a) the first three terms, (b) the first four terms? Answer: (a) 8.0% (b) 1.90% CHAPTER 3 APPROXIMATIONS Suppose that a ruler graduated in centimeters and millimeters is used to measure the side of a square, and by estimating tenths of a millimeter the length is found to be 2.87 cm. The mathematical square of this quantity- is 8.2369 cm 2 , but we have already seen that only three figures of this area can be trusted, because we know nothing about the fourth figure of the measurement from which it is derived. Accordingly, the square is said to have an area of 8.24 cm 2 . Similarly, if one side of a square measures 1.03 cm, the measurement being correct to tenths of a millimeter but nothing being known about hundredths of a millimeter, then its area will be correctly expressed by the quantity 1.06 cm 2 and not 1.0609 cm 2 , and the volume of a cube that has this square for one of its sides will be 1.09 cm 3 rather than 1.092727 cm 3 . It should be noted that: (a) with an ordinary ruler it is impossible to measure a length of a few centimeters with an accuracy greater than is expressed by three significant figures; (b) the area or volume calculated from such data cannot be trusted further than its third significant figure; (c) the example just given suggests a remarkably simplified process of calculation where some quantity to be squared or cubed is a little greater than or a little less than unity. Thus, (1 + 5) 2 = 1 + 28, and (1 + 5) 3 = 1 + 35, provided that 8 « 1. 3-1 NEGLIGIBLE MAGNITUDES The justification of the simplified process of raising a number which is approximately unity to a power will be made more evident perhaps by the following example. Suppose that a metal cube has been constructed accurately enough to measure 1.00000 cm along each edge. If it should be brought from a cold room into a warm room, a delicate measuring instrument might show that the change of temperature had increased each dimension to 16 3-1] NEGLIGIBLE MAGNITUDES 17 1.00012 cm. By unabridged multiplication the area of each side would be 1.0002400144 cm 2 and the volume 1.000360043201728 cm 3 . If the most careful measurements made it just possible to distinguish units in the fifth decimal place, then tenths of those units, represented by the sixth decimal place, would be impossible to measure, and the attempt to state not only tenths but hundredths and thousandths of those units would be absurd. By noting that the number 1.0002400144 differs from the value obtained by abridged multiplication, 1.00024, by only a few thousandths of the smallest measurable amount, we can see clearly why the area of a 1.00012 cm square must be 1.00024 cm 2 . Similarly, the volume of the cube is neither more nor less than 1.00036 cm 3 , and the string of figures running out ten more decimal places is absolutely meaningless. The examples given above suggest that if x is small compared to one, (1 _|_ x) 2 = 1 + 2x, (1 + x) 3 = 1 + 3a;, and in general, (1 -)- x ) n = 1 + nx. That is, by the use of the binomial theorem, we find that (1 + x ) n = l + nx + %n(n - l)^ 2 H , and if x is so small that n 2 x 2 , n 3 x 3 , etc., are negligible compared to nx, we can write (1 _|_ x ) n = 1 + nx. The third term, %n(n — l)a; 2 , in the binomial expansion is usually suf- ficient for determining the inaccuracy of the approximation. When, as often happens, the order of magnitude of the inaccuracy is all that is desired, a mental calculation of x 2 will usually suffice. This approximation is so useful that it should be thoroughly memorized and applied wherever possible. An extension of the above approximation is also useful. If a, b, and c are small compared to unity and if I, m, and n are not too large (e.g., less than 4 or 5), then (1 + a)\l + b) m (l + c) n = 1 + la + mb + nc. The following special cases of these formulas are often useful: = 1 — a, Vl + a = 1 + - > 1 + a ' * ^ , « -'2 ^±± =i + a -b, VAJTd = A^jl + j- 2 = A + ^ The reader will probably meet many others. 18 APPROXIMATIONS [3-2 3-2 APPROXIMATIONS WITH DIFFERENTIAL CALCULUS Differential calculus provides us with an important method of approxima- tion. It can best be explained by means of an example. Suppose we wish to find the approximate value of sin 31° without the aid of a table. Since sin 30° is 0.5, we know that sin 31° will be close to 0.5, and this might be called our first approximation. Let us assume that this approximation is not sufficiently accurate for our purposes. To obtain a better value, we need to know how rapidly the value of the sine function changes with respect to the angle itself so that we can calculate how much the value of the sine changes as the angle changes from 30° to 31°. When angles are measured in radians the change of the sine per unit change in the angle, i.e., the rate of change of the sine, is given by d(sinx)/dx = cosz; the rate of change of the sine is equal to the cosine. If we multiply the rate of change of the sine per unit change in the angle by the amount of change of the angle, we obtain approximately the total change in the sine. Thus, as the angle changes from 30° to 31° the sine of the angle changes by A./*} IT cos 30° X 1° (in radians) = -^ X -~ = 0.01511. Hence sin 31° = 0.5 + 0.01511 = 0.51511. From the tables we find that sin 31° = 0.51504, which shows us that this second approximation is accurate to 0.014%. The small inaccuracy arises from the fact that our assumption that the rate of change of the sine is constant over the 1° interval is not exact. In general, if y = f(x) and the value y x of y is known for a given value Xi of a;, then the value of y 2 at x x + Ax is given by (dy\ Wi y* = y* + yfx), Ax - c 3 - 1 ) This equation is obtained from Taylor's infinite series by dropping the terms with powers of Ax greater than one; these terms are negligible if Ax is small and if the function y is reasonably well behaved. While Taylor's series is treated in most calculus texts, the result for y 2 can be understood without an understanding of that theorem. One need only understand that (dy/dx) i represents the rate at which y increases as x increases at the particular value of x represented by the subscript 1. When the rate of in- crease of y with respect to x is multiplied by the change Ax in x, the result is the change in y. Thus y 2 could be written, and often is, as y x + Ay and Eq. (3-1) as ^(fX- 3-2] APPROXIMATIONS WITH DIFFERENTIAL CALCULUS 19 Helpful reference can be made to Fig. 3-1, which shows a case where the expression for Ay is not sufficiently accurate. The term in Taylor's series which follows (dy/dx) iAx is ^(d 2 y/dx 2 ) 1 (Ax) 2 , which is so large for the function of Fig. 3-1 that an accurate value of Ay for the Ax shown cannot be obtained if the term in question is omitted. On the other hand, it can be seen that the accuracy of the simpler result would be greater for smaller values of Ax. As an example, we may want to obtain the value of logi 103, knowing that log 10 100 = 2. We let y = log io x = (logi e)(log e x), dy log io e 0.4343 dx x x ' dy dx = 0.004343. 100 Substitution in Eq. (3-1) gives log 10 103 = 2 + (0.004343) X 3 = 2.0130, which is in error by only 2 in the last fig- ure given. *2 FIGURE 3-1 It is easily seen that this approximation can be used only where the curve represented by f(x) is continuous and its derivatives are continuous. For example, it would be inconceivable to attempt to approximate the value of tan 91° from the value of tan 89°. Finally, it should be mentioned that Taylor's series can be applied to functions of more than one variable. It will be generally assumed in this book that the requirements of the problem will always be met by using only the terms containing the first power of the increments of the variables. Thus if y is a function / of a set of variables Xi, x 2 , . . . , X{, and some particular set of values of these is x[, x 2 , . . . , x' i} then the increment Ay in y obtained when there are increments A#i, Ax 2 , . . . , Ax{ in the variables away from the primed set, is *y = Uf- \dXl/x> Xl \dx 2 /x' ax 2 + • • • + -:- (V) Ax, Here the subscript x' means that the derivatives are to be evaluated at the primed set of values of the variables. The sign d indicates that the deriva- tive is to be taken only with respect to the one variable ; it is clear, however, that the numerical value of such a partial derivative in general will depend on the values of all the variables. 20 APPROXIMATIONS PROBLEMS 1. Find approximate values of (1.066) n and (0.9988)" for n = 2, %, —2, and iy. Answer: n (1.066)" (0.9988; 2 1.132 0.9976 i 2 1.033 0.9994 -2 0.868 1.0024 i 2 0.967 1.0006 1 3 1.022 0.9996 2. Use approximation methods to find: (a) the square root of 50 to four signi- ficant figures, (b) (1.00036)/ (1.00364). Answer: (a) 7.071 (b) 0.99673 3. When a ballistic pendulum is struck, the height h to which it rises is related to the horizontal distance d through which it moves by the equation h = R - \/R 2 - & with sufficient accuracy if the length of the pendulum, R, is very large com- pared with d. Take R and d to be exactly 292 cm and 15 cm, respectively. (a) Find h by using an approximation to the square root. (b) If the equation were to be solved exactly, how many significant figures must be retained to get the same result that one can get with a slide rule by the method of part (a) ? Answer: (a) 0.385 cm (b) six 4. The ratio of the length of the left arm of a chemical balance to that of the right arm is VW1/W2, where Wi is the apparent mass of a body when it is placed in the left-hand pan and W2 is the apparent mass when it is placed in the right-hand pan. By letting W\ = W2 + 6, find an approximate expression for this ratio. (a) Evaluate the ratio for Wi = 24.5028 g and W 2 = 24.5002 g. (b) Supposing W\ and W2 in part (a) to be exact, what error is introduced by use of this approximation? Will this error have any significant effect? Answer: (a) 1.000053 (b) 5.9 X 10- 6 % 5. Given that y (a) = - e- 1 '* 2 and y{2) = 0.38940. cr (a) By using Eq. (3-1) find 2/(2.01). (b) The third term in Taylor's series, written to conform to Eq. (3-1), is %(d 2 y/dx 2 ) i(Ax) 2 . How large would Aa be in order that this term affect the result? Answer: (a) 0.38843 (b) Act ~ 0.02 yields a third term ~5 X 10~ 6 CHAPTER 4 GRAPHIC ANALYSIS We shall begin in this chapter the direct approach to the analysis of experi- mental data. The chapter is titled Graphic Analysis and is indeed the proper first approach. Nevertheless, we wish to begin with a word of admonition. It is unfortunately true that one finds too often in the scientific literature such phrases as, "The data appear to fit the equation so and so . . . , " or, "The estimated probable error is ... " Such statements mean that the data have been "analyzed" by methods that are not describable precisely. It may well be true that the data being described do not follow exactly the normal curve of error, which is the foundation of most computational methods in common use, but even so, the application of the normal error curve via actual computation can be described precisely, and the numerical results obtained could be reproduced by anyone. On the other hand, familiarity with methods of graphical analysis gives one a "feeling" for such mathematical concepts as continuity, derivative, approach to an asymptote, and so on. It gives one a better understanding of the meaning of residual, which we shall define later, and of the difference between the residual and the hypothetical true error. The methods of graphical analysis are an invaluable aid in the understanding of more sophisticated computational methods. 4-1 CURVE PLOTTING It is usually desirable to plot all experimental data on coordinate paper and to draw smooth curves which fit the plotted points as closely as possible. This is desirable not only because the curves so plotted convey a clearer picture to the mind of the results of the experiment than can the data themselves, but also because curves so plotted can frequently be used to give certain accurate information which cannot be obtained other- wise. The shapes of such curves may serve to verify existing laws or may suggest laws which were not previously known. Smooth curves should 2i 22 GRAPHIC ANALYSIS [4-1 be drawn through the plotted points because our past experience tells us that physical changes are almost never abrupt. In fact, changes which at first seem abrupt or discontinuous are usually found to be continuous on closer examination. If the surface of a solid standing in air is examined closely by modern methods, a transition region will be found where sub- stance of the nature of air gradually changes into substance of the nature of the solid. Occasionally one may miss certain rapid changes by the process of "smoothing," in cases where just one or two points lie off the smooth curve. Regnault, the great French physicist, thought that the vapor pressure curve of ice was continuous with that of water and made one smooth curve link up all the readings. It is now known that at 0°C the two curves should intersect at slightly different slopes. An investigation of Regnault's diagram shows that his data had actually indicated this fact, but he smoothed it out. If at any time one finds a point which does not lie on a smooth curve and suspects that it is not just accidental, he should make further experiments in the neighborhood of that point to better determine the shape of the curve. If further experiments cannot be performed, he is not justified in drawing anything but a smooth curve. The usefulness of the plotted curve depends to a great extent on the choice of scales used in plotting. Three rules which should be followed in the choice of scales are : (a) choose scales that make plotting and reading easy, e.g., put round numbers on heavy lines of the coordinate paper; (b) choose scales so that the curve will nearly cover the whole sheet of graph paper; (c) choose scales such that the accuracy of plotting the points is nearly equal to the accuracy of the data plotted. If, for example, plotting can be done much more accurately than is justified by the accuracy of the data, the points will be unduly scattered and make it difficult to judge the shape of the curve. It is obvious that it will often be impossible to follow all three of these rules at the same time, in which case a com- promise will be necessary. Rule (a) is the most important of the three because "odd" scales invariably lead to errors when points are read off the curves. Whether one should give more weight to rule (b) or to rule (c) will depend somewhat on the purposes of the plot, convenience in use, and the nature of the data. For instance, one often has the case of a dependent variable, y say, governed by the values of two independent variables, x and T; and sometimes a plot of y vs. x for a single value of T might well cover a conveniently sized graph paper so that the plot for a second value of T would not fit on the same sheet of paper. Thus if one wishes to show, on the same plot, y vs. x for several values of T, he must abandon rule (c) in favor of rule (b). On the other hand, the larger scaled plot might be necessary to show the true nature of the dependence of y on x at a single value of T. 4-2] CURVE FITTING 23 4-2 CURVE FITTING A curve showing the relationship between the variables measured is very useful. It gives a clear idea of how a variation of one quantity affects the other. If one can find a mathematical equation that fits this curve, much more information can be gained. For example, suppose a calorimeter cup of known heat capacity is filled with a measured quantity of hot water and a record of its temperature at one-minute intervals is taken. If the temperature is plotted as a function of time, the smooth curve drawn through these points will show that the temperature drops rapidly at first and more slowly as the temperature approaches that of the room. Thus one may infer from the curve that the cup and water lose heat most rapidly when their temperature is high above that of the surroundings. However, more information can be obtained if a mathematical relation- ship can be found connecting the temperature with time. By methods that will shortly be described, one should be able to show that the equation B = e e- bt will fit this experimental curve where 6 is the temperature difference between the calorimeter and its surroundings, O is the value of 6 at time t = 0, and 6 is a constant which depends on the apparatus. The fitting of the equation = e oe - bt to the curve will provide numerical values for b and 6 . By differentiating the equation one finds that This equation tells us that the rate at which the temperature difference drops is proportional to that temperature difference with b as the constant of proportionality. Furthermore, the rate at which heat is lost from the calorimeter is given by , dd h w where h is the known heat capacity of the calorimeter and the water. Therefore, bh gives the number of calories of heat given off by the calorim- eter per second per degree difference in temperature. Thus a great deal of useful information can be obtained from an experimental curve, especially if a mathematical equation can be fitted to that curve. We shall now take up methods for fitting equations to curves. 24 GRAPHIC ANALYSIS [4-3 4-3 STRAIGHT -LINE LAW It is very important to know as much as possible about the fitting of straight lines to experimental points. This is true not only because many experiments yield data that naturally follow straight lines, but also because nearly all data can be plotted in some manner so that they will follow a straight line. The straight-line plot yields so much information so easily that it is standard practice wherever possible to arrange the data so that they can be fitted in this way. The details of transforming data so that they follow a straight line will be discussed in the next two sections. The treatment of data that naturally follow a straight line must be dis- cussed first. The equation y = a + bx has a straight line for its "curve," and this line cuts the y-axis at a height a, and has a slope that is numerically equal to b. Computational methods will be described later by which the best, or most probable, values of a and b can be determined. These methods can be extended, in principle and frequently in practice, to functions which are much more complicated than the linear relationship between two variables. Even the most tedious of such problems are easily handled once the programming has been done with the aid of automatic computers. Indeed, the majority of such problems can be done on a desk calculator without too much difficulty. As discussed in the introduction to this chapter, results which are determined indirectly from experimental data, and which are to be made available for use by others, should always be computed by precisely describable methods. Nevertheless, many of the graphical methods that were used before the advent of the computers or before the increasing availability of desk calculators remain useful. They are not only quick ways to get sufficiently good results for inter- mediate use but, because of their basic nature, they give a better under- standing of the principles that underlie the computations. Finally, and particularly in conjunction with the use of automatic computers, they provide a good way of checking computed results. Hence we shall describe some of them. If a set of experimental measurements, such as those of temperature and length of a metal bar, are found to correspond approximately to a straight- line law, they may be plotted as the x's and y's of a graphic diagram, and their irregularities may be eliminated or "smoothed," by drawing the straight line that appears to come closest to all of the points. It is desirable to use the "black-thread method" to locate the best position of the line in preference to a ruler or other ordinary straight edge since the thread and the points can all be seen at the same time; a ruler would hide half of the points if properly placed. More convenient than the black thread is a strip of transparent celluloid with a fine straight line scratched down the middle of one face. 4-3] STRAIGHT-LINE LAW 25 x, °C y, mm 100 909.8 90 908.5 80 908.0 70 907.2 60 906.7 50 906.5 40 906.2 30 905.5 20 905.0 10 904.1 903.9 —10 903.2 -20 902.3 As an example, consider the data in Table 4-1 Table 4-1. It is suggested that the reader plot the values given in the table as ac- curately as possible, making use of the rules given in Section 4-1, and marking each point by a minute dot surrounded by a small circle. Be sure that the coordinate paper rests in a perfectly flat position and stretch a fine black silk thread on it in such a posi- tion that it follows the general direction of the points. Move it a trifle toward the top or bottom of the page, also rotate it slightly, both clockwise and counter- clockwise. Attempt to get it into such a position that it lies among the points, following their general trend but not necessarily passing exactly through any one of them. See that there are about as many points above the line as below it; if the high points are more numerous toward one end of the thread and the low ones toward the other end, rotate the thread enough to remedy the condition. When the thread is finally arranged in the most satisfactory position, do not attempt to draw the line at once but notice where the thread cuts the £-axis and where it cuts the y-axis, or where it passes through some easily remembered intersections on the paper. From these numbers calculate the slope of the thread, noting whether its value is positive or negative. Write the equation of the line indicated by the thread. The equation x/m + y/n = 1 is another form of the straight-line equation, and is easily reducible to the form y = a + bx. Since the graph of x/m + y/n = 1 passes through the points (0, n) and (w, 0), m and n are respectively the x- and y-intercepts of the line. The equation of a line that cuts both axes can be written immediately without any calcula- tion. The constants a and b in the straight-line equation can be obtained from the intercepts m and n on the axes or from any other two points far apart on the line. It should be noted that since coordinates are physical quantities and have units, the constants m, n, or a, b also have units. To make the first equation consistent, m and n must have the same units as x and y re- spectively ; and to make the second equation consistent, a must have the units of y, and b must have the units of y/x. For example, if these con- stants are determined from a plot of the data in Table 4-1, the units are degrees C for m, mm for n, mm for a, and mm/°C for b. Sometimes data will be so good that when they are plotted on graph paper no larger than 8.5 by 11 in., most of the available accuracy is lost; 26 GRAPHIC ANALYSIS 12 11 10 9 8 7 y 6 5 4 3 2 1 [4-3 - - / " - y vs. x/ / i 'Ay vs. a; - J / i \ / - / 1 » - -/ - 0.06 0.05 0.04 0.03 0.02 Ay 0.01 -0.01 -0.02 1 Z ° 4 FIGURE 4-1 :r i.e., the plotted points fall more accurately on the straight line than can be shown on the plot. Then larger sheets of graph paper are useful. How- ever, large sheets are inconvenient and it is difficult to make accurate plots on them. For these situations a device called the method of "residual plot" will be found useful. The method of residual plot is as follows: First, determine approximate values for the constants a and 6 of the straight-line equation; call these a' and b'. Second, calculate values of y' for each of the x's where y' = a' + b'x. Third, calculate values of Ay = y — y' . Fourth, plot values of Ay against X to a larger scale than y was originally plotted. Fifth, determine the slope Ab and the intercept Aa of this residual plot. Sixth, calculate accurate values for the constants from a = a' + Aa and 6 = 6' + A6. The effectiveness of this method arises from the fact that the A^/'s are so small that they can be plotted on a small sheet of graph paper to a much larger scale than the y's can be plotted. As an example of the method of residual plot, assume that the plotted points are (1, 5.002), (2, 7.034), (3, 9.059), and (4, 11.068). Determine approximate values of a and 6 from the trial straight line of Fig. 4-1. These are a' = 3.0 and b' = 2.000. Determine y' from y' = a' + b'x and Ay from Ay = y — y' , as shown in 4-4] NONLINEAR LAWS 27 Table 4-2 X y y' Ay 1 5.002 5.0 0.002 2 7.034 7.0 0.034 3 9.059 9.0 0.059 4 11.068 11.0 0.068 Table 4-2. From the plot of Ay vs. a: shown in Fig. 4-1, we find that Aa = —0.015 and Ab = 0.023. Since a = a' + Aa and b = V + Aft, a = 2.985 and b = 2.023. 4-4 NONLINEAR LAWS Whenever the experimental points lie on a straight line, the law of variation is easily deduced. When the experimental points do not lie on a straight line, as frequently happens, the problem is more difficult. However, if one can guess the proper mathematical equation, it is usually a simple matter to fit this equation to the points. Frequently, the law of variation will be known from theoretical considerations and the problem is then one of determining the constants of the equation by fitting the known mathematical equation to the experimental data. If there is no way of knowing the law except from the shape of the smoothed curve, the problem is one of trial and error. The simplest method of fitting nonlinear equations to experimental data can best be explained by means of an example. Suppose that the equation y 2 = a + bx 3 is to be fitted to the data of Table 4-3, which is suspected of following that law. We note that if we set y 2 equal to a new variable v and x 3 equal to u, the equation will become v = a + bu which is linear in v and u. Therefore, a plot of v against u will be linear and can be treated exactly as in Section 4-3. Thus, although a plot (y vs. x) of the figures in the first two columns will not give a straight line, the figures in the last two columns of Table 4-3 will. The value of b is given by the slope and the value of a by the ^-intercept of the straight line drawn. Table 4-3 X y u = X 3 v = y 2 1 2.2 1 4.8 2 4.4 8 19.4 3 7.6 27 57.8 4 11.4 64 130.0 5 15.9 125 252.8 28 GRAPHIC ANALYSIS [4-4 If we had picked the wrong equation in the first place, the new variables plotted would not lie on a straight line and another equation would have to be tried. The mathematical functions which most usually appear as relations between physical variables are the trignometric functions, the exponential and logarithmic functions, and powers. It is advisable to become very familiar with the shapes of these functions, alone and in combination, by making graphical plots of equations like the following, with constants a, b, c of various sizes and algebraic signs : bx /i bx\ y = ae cos ex, y = a{\ — e ) y = a sin bx, y = a cos bx, y = a tan bx, and finally/ y = ae blx , ( or In y = In a -\ — j y = ae~ x (with b positive), En a n x . n=0 Constants for this last equation could be obtained by using as many points from the smoothed curve as there are constants in the equation selected. The values of x and y for the individual points can then be substituted into the equation. This will give as many equations as there are unknowns so that the values of the unknowns can be obtained by solving the equations simultaneously. Obviously it is best, if the shape of the curve permits, to use as few terms in the equation as possible so as to reduce the labor required in determining the constants. More importantly, the inclusion of too many terms tends to cause the recognition of imprecision as precision. The inclusion of seven terms in a * 23 is the standard symbol for "sum of." The appended equation, then, means ao + a\x + a,2X 2 + 03a; 3 + • • • If an upper limit to n is specified, as the result will be a polynomial. To specify an infinite series, the sign °o is given as the upper limit. The lower limit need not be zero, of course, but note that 2 a n x , n=—2 say, is infinite at x equal zero. PROBLEMS 29 sum of powers to fit the data of Fig. 4-2 could produce the solid curve when, in fact, the dashed line might represent the data better. In other words, what is meant by, and perhaps what is lost by, "smoothing" must be carefully considered. fig. 4-2. Data scattered about a straight line (dashed) could suggest remarkably complex physical laws (solid line) if care is not taken. Con- versely, one must guard against mis- taking complexity for scatter. PROBLEMS 1. In an experiment designed to determine the relation between two variables the following results were obtained. z, lb. V, in. 10 — 1.30 20 -0.91 30 -0.49 40 -0.03 50 0.34 60 0.81 70 1.16 80 1.65 Plot a graph showing the relation between x and y; use a black thread and express the relation in the form y = mx + C. Answer: y = 0.0417a; — 1.731 2. By using the method of the residual plot, find the resistance Ro at 0°C and the temperature coefficient a, where the resistance is R = Ro(l + at), for a sample of wire giving the following data : R (ohm) 2.0700 2.1465 2.2225 2.2990 2.3750 2.4515 *(°C) 10 20 30 40 50 60 30 GRAPHIC ANALYSIS 3. Observations are made of the excess temperature 6 of a cooling body as a function of the time t. Determine whether the data follows Newton's law of cooling, 6 = do e~ bt , and if so, determine the constants do and 6 by graphical methods. t (min) 0(°C) 4 8 12 16 20 40.5 27.0 18.0 12.0 8.0 5.3 Answer: do = 40.5°C b = 0.1015 mm- 1 Assume that the following data has been obtained from a laboratory experi- ment in which the time intervals were known to a high degree of precision. There is reason to believe that the data follow one of the two theoretical equations: 2 a + bx a + bx y = or y = ■• x x The data are: x (sec) V (cm) 10.025 2.006 —2.006 —5.012 Determine by plotting which equation fits the data better and determine the values and the units of the constants a and b in the better equation by using the method of the residual plot. Answer: Prefer the second form; a = 12.0311 cm sec, b = 2.00514 cm/sec. Find the constants A and B for Cauchy's equation (/x = A + B/\ 2 ) for the index of refraction of light through a refracting medium. The data given is for sodium light refracted through water. Does the equation appear to be inaccurate in any region? If so, state which region. M 1.4040 1.3435 1.3371 1.3330 1.3307 1.3289 x(A) 2144 3968 4861 5893 6708 7685 Answer: The equation is very poor at the shortest wavelength. An expanded plot of the rest of the data shows that the equation is not very good any- where. Nevertheless, in the region of the longer wavelengths, A = 1.3238, B = 3.126 X 10 -11 cm 2 . Note that the problem implies that one is to plot H vs. (1/X 2 ). An alternative method, juX 2 vs. X 2 does not show up the inad- equacies of the equation. CHAPTER 5 ERRORS Numerical quantities which are found in handbooks or quantities which are used in industrial and experimental work are, in the last analysis, determined by some measurement which is either direct or indirect. Whenever measurements are made, there are errors present. Increasing the accuracy of a measurement is accomplished by reducing the magnitude of the errors but never by eliminating all of them. If, for example, all systematic errors have been eliminated in a particular measurement, certain accidental errors always remain. Errors then are divided into two general classes : systematic errors and accidental or chance errors. Generally, we associate the first type with "accuracy" and the second type with "precision. " 5-1 SYSTEMATIC ERRORS Systematic errors may be divided into several classes: theoretical, instru- mental, and personal errors. Theoretical errors. Errors due to the expansion of the measuring tape with temperature, the buoyant effect of air on weights in the chemical balance, the refraction of light in surveying, the air friction in gravitational- acceleration measurements, or the variation of the voltage being measured caused by the current drawn by the voltmeter, etc. are all errors of this class. They are reproducible effects and are often of a greater magnitude than the chance errors that would be found in such measurements; they constitute errors when unrecognized. Once recognized, they can be measured if necessary or, more generally, eliminated by calculating the magnitude of their effect and making the correction. For example, if an important dis- tance in an apparatus is defined by the length of a steel bar of known co- efficient of thermal expansion, whose temperature changes during the course of the experiment, then any measurement with this apparatus contains a systematic error if that temperature change is not taken into consideration. 31 32 ERRORS [5-2 The error is eliminated by measuring the length at one temperature and calculating it for other temperatures. Instrumental errors. Errors in the division of graduated scales on instruments, the eccentricity of graduated circles, which should be con- centric, the inequality of balance arms, the inaccuracy of the calibration of electrical indicating instruments or thermometers are errors of this class. These errors can be eliminated or reduced by applying experi- mentally determined corrections or by performing the experiment in such a manner as to eliminate their effects. Personal errors. These errors are due to personal peculiarities of the observer, who may always answer a signal too soon or too late, or who may always estimate a quantity to be smaller than it is, etc. The character and magnitude of these errors may be determined by a study of the observer. His "personal equation" may be obtained and his observations corrected for these sources of error. Mistakes. Strictly speaking, mistakes are not errors, for mistakes will not occur when the observer takes sufficient care, whereas errors cannot be eliminated by care alone. However, the best observers will occasionally relax their vigilance and make mistakes. Usually these mistakes can be detected if every observation is recorded directly on a data sheet or in a data notebook and not altered. The figure 8 is frequently mistaken for the figure 3 and vice versa. An angle of 52° may be read 48° when the observer notes that it is just two degrees away from 50°. Such mistakes are usually obvious in the original data but may be covered up if the results of calculations are recorded instead of the original data. For this reason it is imperative to record the actual readings, not just the dif- ference between two readings. It is also important to avoid erasing any entry which may be thought to be wrong, since it may later turn out to have been correct. When an observation is thought to be mistaken, it is best to draw a neat line through it to indicate the desire to reject it. 5-2 CHANCE ERRORS Chance errors are due to irregular causes. These errors are most likely to appear when accurate measurements are being made, for when instru- ments are built or adjusted to measure small quantities, the fluctuations in the observations become more noticeable. For example, if one wishes to measure the length of a laboratory table to the nearest foot, there would be no excuse for any observation differing from any other observation no matter how many times the measurement is made. However, if one attempts to measure the length of the same table to the nearest 10 1 00 in., the individual observations will be sure to differ greatly among themselves provided that these observations are performed with care and without prejudice. 5-2] CHANCE ERRORS 33 When all of the systematic errors have been eliminated or reduced to a minimum, the remaining chance errors will be the chief source of impre- cision. Fortunately, most chance errors have been found to follow a definite mathematical law, called the normal or Gaussian distribution of errors. When the errors are distributed in such a predictable way, they can be examined in such a manner as to increase the reliability of the results. By studying the distribution of errors and the laws governing the probability of the occurrence of errors, one may greatly increase the precision of his experimental results. The treatment of chance errors forms a large part of the subject of precision of measurements. The laws of probability will be treated in the next chapter as a prelude to the treatment of chance errors. CHAPTER 6 PROBABILITY In order to understand the treatment of chance errors it is not necessary to have a highly sophisticated knowledge of the laws of probability, but a basic knowledge is necessary. The following brief treatment will suffice for the purpose. If a more advanced understanding of the subject is desired, one should consult standard texts [1, 2]* on the subject. There are two general methods for determining the probability that an event will occur. The analytical method is appropriate when the event can be broken down into more basic events, whose probabilities of occur- rence are known on some a priori basis. When this cannot be done, it is necessary to resort to the experimental method. These two methods will be introduced in the order in which they are mentioned here, but in most of the subsequent illustrative discussion we shall return to the analytical method. 6-1 ELEMENTARY LAWS OF CHANCE: THE ANALYTICAL METHOD The laws of chance, which apply to many situations, including chance errors in measurement, are most easily understood by considering the tossing of perfect coins and the throwing of perfect dice. Here the basic, elementary event is the appearance of a particular face of a single coin or a single die. In each case there exists an a priori basis by which the probability of the occurrence of such an event can be determined. If a perfect coin of negligible thickness or with a rounded edge is tossed once, only one of two possible events can occur. Hence the probability of the occurrence of either is just %. Similarly, the probability of the appearance of any one face of a perfectly balanced die tossed once is just %. * Numbers in brackets are keyed to the References at the end of the book. 34 6-2] THE EXPERIMENTAL METHOD 35 When a complex event, such as the appearance of five heads when seven coins are tossed, is broken down and discussed on the basis of the a priori probability of the occurrence of a head when a single coin is tossed once, it is the analytical method that is being applied. 6-2 THE EXPERIMENTAL METHOD This method can be introduced with an example in which it is not actually needed. The example is provided by the data of Table 6-1, which shows the results of successively greater numbers of tosses of a coin. The basis of the experimental method lies in the statement that in general the probability of an event occurring in one trial is defined as the limit of the ratio of the number of occurrences to the number of trials tohen the latter increases indefinitely. Since, in fact, no coin for instance can be known absolutely to be perfect, the choice between the use of the analytical or the experimental methods must depend on what is at stake. The application by gamblers of the experimental method to a roulette wheel which paid on the basis of the analytical method has been known to force the sub- stitution of a new wheel. Table 6-1 Number of throws Heads Tails Difference Ratio 10 7 3 4 0.700 20 12 8 4 0.600 40 23 17 6 0.575 100 55 45 10 0.550 200 108 92 16 0.540 500 257 243 14 0.514 1000 513 487 28 0.513 2000 1016 984 32 0.508 There is a further interesting observation to be made from the data of Table 6-1. As the number of throws increased, the ratio did approach 0.5, but the difference between the number of heads and the number of tails thrown tended to increase, in this trial. This possible behavior of the difference even with a perfect coin is some- thing that the layman seldom takes into account. When an individual has lost money on a game of chance in which his probability of winning is 0.5 he frequently continues to play on the assumption that the laws of probability guarantee that he will come out even in the long run. This assumption is fallacious for it should be obvious that a coin has no "memory" which can cause its future action to be governed by its past. 36 PROBABILITY [6-3 The laws actually predict that the individual is quite likely to lose more and more. The experimental method is, of course, very widely used in situations where there can be no basis for even attempting to apply an analytical method. The experimental method is used in making mortality tables for the guidance of life insurance companies. From the number of deaths occurring in each age group in the past these companies can predict with reasonable accuracy their risk in insuring the lives of individuals. In this case, fortunately for the insurance companies, and presumably because of continued progress in medicine, the method has generally predicted higher death rates than have been experienced. The analytical method could not be used in this case. Companies that manufacture large quantities of their products frequently inspect only a small percentage of them. For the inspection of small quan- tities they can afford to use more careful methods than they could for the inspection of the entire output. Tests have shown that the inspection of properly chosen samples gives a more reliable determination of the uni- formity of the product than has been obtained by inspecting the entire output. 6-3 COMPOUND PROBABILITIES: THE ANALYTICAL METHOD The analytical method of determining the probability of a given event is applicable when that event can be broken down into more basic independent events which have known probabilities. For example, when any perfect coin is shaken up and tossed on a carpet, the probability of it landing tail up is the same as for any other perfect coin and, in particular, has the value 0.5. If two perfect coins, e.g. a nickel and a penny, are tossed, the following four combinations are equally probable: head-head, head-tail, tail-head, and tail-tail. The probability of any one of these combinations occurring is then 5 or 0.25. Since two of these four combinations contain a head and a tail, the probability for such a combination is f . Likewise the probability of not getting two heads is f since three of the four com- binations do not contain two heads. As another example, consider the probability that two fours will come up when two dice are thrown. The probability of getting four on any single die is ijt so the probability of getting two fours with two dice is ^ X \ = ^g-. This can be seen from the fact that there are 36 different ways the two dice can fall and only one of these ways produces two fours. Events of the sort in the examples above shall be called compound events. The probability of a compound event happening is the product of the probabilities of the independent events that make up the compound event. 6-3] COMPOUND PROBABILITIES 37 The rule for compound probabilities can be shown more generally as follows. Suppose that the first event can happen in A ways and fail in B ways, all the ways being equally probable, while the second event can happen in A' ways and fail in B' ways, all of which are equally probable. The probability of the first event happening is A A + B and that of the second event happening is A' A' + B'' For each of the A ways the first can happen, the second can happen in A' ways. Therefore the number of ways both can happen is A A'. Like- wise, for each of the (A + B) ways the first can either happen or fail, the second can either happen or fail in (A' -\- B') ways and, therefore, the two can either happen or fail in (A + B){A' + B') ways. Thus the probability that both will happen is AA' (A + B){A' + B') This establishes the rule for the probability of a compound event made up of two independent events. Before considering events compounded from more than two simple events it is helpful to relate this discussion to the four different compound events that are possible with the two coin example. The number of ways each of four events can happen is shown in Table 6-2 together with the Table 6-2 Compound event Number of ways Probability (1) Both happen (2) Both fail (3) First happens, AA' BB' AB' BA' A A' {A + B){A' + B') BB' (A + B){A' + B') AB' second fails (4) First fails, (A + B)(A' + B') BA' second happens (A + B){A' + B') Total (either happen or fail) AA' + BB' + AB' + BA' = U + BKA'+B') 1 38 PROBABILITY [6-4 probabilities of their happening. Here, the prime is assigned to one coin, no prime to the other. Each letter will have the value unity. "Happen" could mean the appearance of a head and "fail" the appearance of a tail, or vice versa, so long as consistency is maintained. As should be ex- pected, since one of these four must happen, the sum of their probabilities is unity, which corresponds to certainty. It is seen that if one only specifies the event "one happens and one fails" rather than which is to happen and which is to fail, the latter being items 3 and 4 of Table 6-2, then this event can happen in two ways. The proof can easily be extended to include compound events requiring the occurrence of any number of simple events, because this proof is valid when either of the two events is itself compound. For three events having probabilities of occurrence p\,P2, and p 3 the probability that the first two will occur is piP2- Next consider occurrence of the first two events as a single event that may be combined with the third. Thus the probability that all three occur is (p\P2)Pz = PiPzPz- This process can be continued to include any number of events. 6-4 PROBLEM OF n DICE Special consideration should be given to a particular type of compound event, those involving a large number of simple events each of which has the same probability of occurrence. Let us call this problem the "Problem of n dice, " because it is so easily illustrated and treated by considering n identical dice with identical markings. One should keep in mind that the solutions to this problem are applicable to a great many situations not involving dice. In particular, we will find it to be directly applicable to certain problems involving chance errors. Let us first work out a problem in which five identical dice are thrown. Assume that each of these dice has two of its six faces painted red while the rest are white, and that none of the faces have dots or other markings. If these five dice are shaken and thrown, it is obvious that the probability of all five coming up with red faces isfxfxfxfxf= (i) 5 and the probability of them all coming up with white faces is (f) 5 . If one asks, "what is the probability of their coming up with two red faces and three white?", the problem becomes more complicated. Although the com- bination containing either all red faces or all white faces can be achieved in only one way, the combination containing two red and three white faces can be achieved in ten different ways as shown in Table 6-3. The letter R in row 2 and column 7 indicates that the No. 2 die comes up with a red face in the 7th combination. No. 3 die is white in this combination. The probability of the first combination is ^ X^X§Xf X§= (i) 2 (f) 3 and the probability of the second combination is|X§XiX|X§ = (^) 2 (f) 3 . Each of the ten combinations has the same probability of occur- 6-4] PROBLEM OF n DICE 39 Table 6-3 Die Combination 1 2 3 4 5 6 7 8 9 10 1 R R R R W W W W W W 2 R W W W W W R W R R 3 W R w W W R W R R W 4 W W R W R W W R W R 5 w w W R R R R W W W rence. Since there will be two red and three white faces if any one of the ten combinations occur, the probability that just two red and three white faces come up is ten times the probability of occurrence of any one com- bination, or 10(|) 2 (f) 3 . The application of this treatment to events not involving dice is rather obvious. If five different events each have probabilities of occurrence of ^, their probabilities of failing to occur will each be § . The probability that two of these events will occur and the other three fail to occur is 10(^) 2 (§) 3 . The solution of the general problem follows the same procedure. Assume that there are n elements, and trial with one has a probability of success p and a probability of failure q. Thenp + q — 1. The probability of success with each element in a single, simultaneous trial of all n of them is p n and the probability that all fail is q n . Again the problem will be more complex if one asks the probability of just r of the n events occurring and the other (n — r) events failing because, as with the examples, such a result can be achieved by various combinations. As before, each of these com- binations has the same probability of occurrence, namely, p r q n ~ r . In order to find the answer to the problem in question it will be necessary to find the number of combinations that will produce the result desired, that is, the number which corresponds to our number 10 above. In the problem of the dice above, 10 represents the total number of combinations possible when two things are selected from 5; there are 10 different ways in which only two of the five painted dice can come up red. In the general case of n possible events one must find the number of different ways in which the particular r events he is investigating can occur. This number is usually referred to in the abbreviated form, "the combination of n things taken r at a time, " and is designated by the symbol C(n, r) . The prob- ability then that just r of the n equally probable events will occur and the other (n ~ r) events fail is (P(n, r) = C(n, r)p r q n ~ r . (6-1) The subject of combinations is well treated in most algebra texts. The principles involved also underly the problem of arrangement, and are 40 PROBABILITY [6-5 essential for much of the work in probability. Since probability in turn lies at the root of the study of chance errors in measurement, we shall treat all these related topics in the next paragraphs. 6-5 PERMUTATIONS AND COMBINATIONS Suppose we have 5 toy building blocks lettered A, B, C, D, and E and a box divided into 3 compartments each of which will hold just one block. The number of different ways in which we can arrange the 5 blocks in the 3 compartments is the "number of permutations of 5 things taken 3 at a time"; it is designated P(5, 3). On the other hand, if we have a bag that will hold just 3 of these blocks, the number of different bagfuls that can be selected from these five blocks will be less than the number of ways in which three blocks can be arranged in the box, since it is impossible to distinguish any ordering in the bag. This number of different bagfuls is the "number of combinations of 5 things taken 3 at a time," and is designated C(5, 3). It is easier to calculate the equation for C(n, r) from the equation for P(n, r) than to calculate it directly. We shall, therefore, treat P(n, r) first. Consider the problem of calculating P(5, 3) by referring to Table 6-4. Any permutation must have one of the five blocks in the first compartment or space so that there are 5 choices for the first space. When this space is filled there are only 4 choices for the next space, but there are 4 second- space choices for each of the 5 first-space choices. Thus there are 5 X 4 = 20 choices for the first two spaces. When the first two spaces are filled there are only three blocks to choose from for the third space, but there are 3 third-space choices for each of the 20 choices for the first-two spaces. Thus there are 20 X 3, or 5 X 4 X 3, choices for the three spaces. That is, P(5, 3) = 5 X 4 X 3 = 60. These 60 choices are listed in Table 6-4. It is a simple matter to pass from the special case to the general, where there are r spaces to fill with n different things. In this case there are n choices for the first space and for each of these n choices there are (n — 1) choices for the second space. This gives n(n — 1) choices for the first two spaces. For the third space there are (n — 2) choices for each of the n(n — 1) choices for the first two spaces. At the rth space one finds that there are (n — r -\- 1) choices for each choice for the previous spaces. Thus P(n, r) = n(n — 1)0 — 2) • • • (n — r + 1). (6-2) This is usually the simplest form for calculations,* but the equation can * This statement will be true when n and r are of the magnitudes implied by the examples being discussed. When n and r are very large, approximation methods, to be referred to later, become almost imperative. 6-5] PERMUTATIONS AND COMBINATIONS 41 Table 6-4 ABC B A C CAB DAB E A B A B D BAD CAD D A C E A C ABE B A E C A E DAE E A D A C B B C A C B A DBA E B A A C D BCD C B D D B C E B C ACE B C E C B E D B E E B D ABB B D A C D A D C A E C A ADC B D C C D B D C B E C B A D E B D E C D E D C E E C D A E B B E A C E A D E A EDA A E C B E C C E B DEB E D B A E D BED C E D DEC E D C be made more compact by multiplying the right-hand side by factorial (n — f) (n — r)\ factorial (n — r) (n — which is unity. Then Eq. (6-2) becomes P(n, r) = n(n — l)(n — 2) • • • (n - r + 1) r)\ (n — r)\ n\ (n — r)\ (n — r)\ In the above problem it was seen that any given combination of three letters such as A, B, and C can appear in several different permutations, CAB, BAC, CBA for example. There are always more permutations than combinations; in fact, the total number of permutations can be obtained by simply multiplying the number of combinations of n things taken r at a time by the number of ways those r things in each combination can be permuted, that is, by the number of permutations of r things taken r at a time. From Eq. (6-2) we see that P(r, r) = r!. Thus P(n, r) = r\C(n, r) or P(n, r) C{n, r) n\ r! (n — r)\r\ (6-3) In connection with the problem solved in Section 6-4, as well as for use in the next section, we note that the binomial expansion of (q + p) n can be written as r=n (6-4) Each term of this summation represents the probability of a particular compound event happening. The rth term is identical to Eq. (6-1) and * For these equations to be valid for the special case n = r, one must take 01 = 1. The proof of the validity of this latter equation is outside the subject matter of this book. 42 PROBABILITY [6-6 therefore gives the probability of just r of n events happening given that the probability of each happening is p and the probability of each failing is q. The probability of all events failing (r = 0) is C(n, 0)q n p° = q n , the probability of just one event occurring, for which r = 1, is C(n, l)g n_1 /) = nq n ~\ and so on. In the summation there is one term for every possible number of the n events that can occur. Therefore, the sum of all these terms should be one, or certainty. Since q + p — 1, (q + p) n = 1, so the sum- mation does equal one. 6-6 THE BINOMIAL DISTRIBUTION Equation (6-4) represents what is known as the binomial distribution. We use the term "distribution" because through the variation in the values of r, the equation represents the distribution of probability among these events. We discussed the meaning of this term in the last section. We shall give two detailed examples here. Consider the tossing of 10 good coins and suppose that we want to know the number of heads which appear. We refer to each of the 10 coins as an element ; the probability of success in a single trial of a single element is p = i The solution is attained by describing the probabilities of seeing no heads, 1 head, 2 heads, etc., in a single trial of the 10 elements. These probabilities are given by the successive terms of Eq. (6-4). For this example q is also \ so that the products q n ~ r p r are always (^) 10 = uki- To be more specific, the term for r = is C(10, 0)(i) 10 (^) , that for r = 1 is C(10, 1)(£) 9 (£)\ etc.; the value of C(10, r) is given by Eq. (6-3). Thus we can determine the binomial distribution (P(10, r) by evaluating all such terms for r = through 10. (See Fig. 6-1.) The distribution shown in Fig. 6-1 is symmetric, but this is not a general property of the binomial distribution; it is true only for this special case in which p = q = \. For example, consider the rolling of 10 dice, with attention directed to the number of aces which appear. Here the probability of success in a single trial of a single element is p = \ and, of course, q = f . If exactly the same procedure is carried out for this problem as for the previous one, we will obtain Fig. 6-2. Evidently the distribution would be exactly the same if we were interested in any other single face of a die. We can now ask, "What is the average number of occurrences of a single event, or what is the 'expectation value' for this event, in many trials with the n elements?" It is clear from Fig. 6-1 that the most probable number of heads per trial is 5, and from Fig. 6-2 that the most probable number of aces in a single roll of 10 dice is 1. However, the average numbers in each case are not necessarily the same as these most probable numbers. 6-6] THE BINOMIAL DISTRIBUTION 43 0.2 0.1 0.2461 q = p- 0.2051 0.1172 0.0439 1 0.0098 <^ 0.0010 1 2 3 4 5 6 7 8 9 10 r fig. 6-1. The bar at r = 6, for instance, has a height proportional to the probability of finding 6 heads when 10 good coins are tossed. Since the width of each bar is unity, the areas of the bars are also proportional to the probabilities. The term "average" implies that a large number N of trials must be made, each involving the n elements. The number of occurrences of the event in question must be added, and this sum must be divided by N. It is at this point that one of the basic problems in the treatment of chance errors can be introduced. It is evident that in any experiment, the number N must be finite. It is obviously foolish to consider carrying on the same experiment indefinitely; fortunately it is also not necessary to contemplate such a hopeless idea. One might almost say that the object of a book such as the present one is to demonstrate this latter fact. It is certainly true that if one were to carry out an averaging process with a finite number of trials, and then to repeat it, he could not expect to obtain exactly identical results. That is, a difference is always to be expected between an experimental average and the theoretical expectation value of an analytically determinable distribution of probability. The problem at hand, however, is to determine the expectation value for the binomial distribution; we shall come to the difference between it and an experimentally determined average later. 44 PROBABILITY 0.3 0.2 0.1 1 r i r 0.3230 0.2907 0.1550 P = \> ? = f 0.0543 0.0130 / 0.0022 [6-6 9 10 FIGUBE 6-2 By its very definition, we can determine the expectation value by adding the products of each possible number of successes and the analy- tically defined probability that that number of successes will occur. Since the sum of the probabilities is one, no subsequent division by a number of trials is necessary. If the expectation value is called fx, then by the above definition, n M = X) r \. c ( n > r )<i n ~ r P r ]- The factor in brackets is not zero for any value of r. But the first term of the sum is zero since r itself is zero in this term. Thus M ft n! -J r\(n — r) q p . 6-7] THE POISSON DISTRIBUTION 45 Next we note that r/rl = l/(r — 1)!. It is convenient then to let y = r — 1 and note that when r = n, y — n — \. We shall designate this latter value of y by I. Then we have V* n(l\) l_y y + l where n\ has been factored into n[(n — 1) !]. If the fixed numbers n and p, factors which are the same in all terms of the sum, are brought out, we obtain But by Eqs. (6-1), (6-3), and (6-4) we find that this is just M = np (6-5) since (q + p) l = 1 regardless of the value of I. In the symmetric distribution of Fig. 6-1, m = 5, which is also the most probable number of heads. In fact, the analytical average, i.e., the expectation value, is identical to the most probable event for any sym- metric, single-peaked distribution. In the case of the asymmetric distri- bution of Fig. 6-2, however, n = 1.667, which, being nonintegral, is not one of the possible events. The problem raised earlier concerning the differences between the expectation value and an experimental average must be discussed with reference to the width of the distribution. We shall defer this discussion to a later chapter. 6-7 THE POISSON DISTRIBUTION The well-known and very important probability distribution function called the Poisson distribution can be easily derived from the binomial distribution. It is obtained when p, the probability of success in a single trial of a single element, is extremely small but the number of elements involved in a single experimental trial is so great that a measurable success is assured. Thus the Poisson distribution is deduced from the binomial distribution by assuming that n increases without bound and p decreases without bound, but in such a way that the product np, the expectation value, remains finite. The same script (P will be used to designate this new distribution, but it will be given in terms of /* and r instead of n and r. That is, (P(n, r) = lim (n — r) \n r r ! )n — r 46 PROBABILITY [6-7 where we have used the fact that q = 1 — p. Now, n\ _ n(n — l)(n — 2) • • • (n — r + 1) n r {n — r)\~ n r ~ \ n)\ ~ n)'\ n J' where the second form on the right-hand side is obtained by noting that there are just r factors in the numerator of the first form. As n gets larger without bound, this whole expression approaches one. The other factor in (P which contains n is (-'J /A (n — r)(n — r — 1) /> x2 <» " r) ^J + 2! Vn (n — r)(n — r — l)(n — r — 2) 3! \n) we obtain this by binomial expansion. Shifting the n's in the powers of (fx/n), we get .2 -O-^-'-^X 1 -^ 1 )! Again, as n is increased without bound, this becomes / \n—r 2 3 Hence the Poisson distribution is (P(M,r) = ^» (6-6) and it is to be noted again that, whereas the binomial distribution is given in terms of the number r of successes, the probability p of success in a single elementary trial, and the number n of elements per experimental trial, the Poisson distribution is given in terms of the number r of successes and the expectation value per experimental trial, /*. Perhaps the most obvious application of the Poisson distribution is to the field of radioactive counting. Consider the case of a very long- lived alpha-emitting metal such as uranium. The probability p that a single atom of 92 U 238 will "fire" in one second is only about 5 X 10 -18 . On the other hand, a half milligram of this metal will contain about 1.3 X 10 18 atoms. Thus we have here the case envisaged during the 6-7] THE POISSON DISTRIBUTION 47 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 1 1 1 y L 1 1 1 1 - - - - - - - - - - 1 - r i 1 I 1, 1 1 1 4 6 8 10 Number of counts, r 12 14 FIGURE 6-3 deduction of the Poisson distribution from the binomial distribution; n is very large and p is very small, but they are so related that the product np is finite and has the value 6.5. Suppose that p is the elementary probability of success, success being the observation of a decay or count, and that the single experimental trial is the observation of the 0.5 mg of uranium for one sec. When we say that the expectation value is 6.5, we mean that if the numbers of decays observed to occur in many individual one-second intervals are added up and the sum divided by the number of such intervals, we will obtain a number close to 6.5 as the expected result. Again, as in the previous section, the discussion of how close to 6.5 we may expect this result to be will be deferred to a later chapter. Here we shall discuss the analytically determinable values of the distribution. With fx = 6.5, the probability that r counts will be observed in a one- second interval is given by (6.5)' (P(6.5, r) -6.5 Values of this function are given in Table 6-5 and plotted in Fig. 6-3. We can make several observations about this figure : the distribution is not symmetric; and, as with the binomial distribution which described the rolling of 10 dice, the expectation value does not coincide with any of the actually possible observations, which can only be some integral number of counts per one-second interval. It is for this latter reason that, though the 48 PROBABILITY [6-7 Table 6-5 r <P(6.5, r) r (P(6.5, r) 0.0015 7 0.1462 1 .0098 8 .1188 2 .0318 9 .0858 3 .0688 10 .0558 4 .1118 11 .0330 5 .1454 12 .0178 6 .1575 13 .0089 points in Fig. 6-3 are connected for illustrative purposes, a smooth curve is not drawn between them. It is seen, however, that the maximum value does fall near the expec- tation value. Connecting the points of Fig. 6-3 serves to suggest another property of this distribution, that, as /jl increases, the fractional or relative difference between /j, and the maximum decreases. Furthermore, if such a distribution is plotted for large values of fx with more closely spaced abscissa units so that the curves always lie on coordinate axes of about the same absolute size, the curves will appear smoother than the one depicted in Fig. 6-3. Another interesting observation, a numerical description of which will have to accompany the later discussion of expected differences between ix and experimental averages, is that when /* is 6.5 it is rather improbable that there would not be any count, or only one count, or 13 counts in a single one-second interval. One would have to make observations over a large number of one-second intervals to be reasonably sure of observing any intervals in which such large or such small counts occurred. Based on the values in Table 6-5, we could expect that in an observation of 10 4 one-second intervals, only about 15 intervals would go by with no count, about 89 with 13 counts, but 1575 intervals would have six counts. If for these 10 4 trials, say, we had plotted in Fig. 6-3 the numbers 15, 98, 318, etc., rather than the values of (P(6.5, r), we would have a frequency distribution. To think of the problem in this way reemphasizes the dif- ficulty just pointed out. While 10 4 one-second intervals constitute a sizable fraction of a working day, they are obviously not a sufficiently large number of intervals to provide us with exact experimental knowledge of the distribution; no matter how many observations are made, they are still not sufficient. The sum of the (P(6.5, r) in Table 6-5 is 0.9929. In 10 4 one-second intervals then, 9929 of them might be expected to contain a number of counts somewhere between zero and 13. Even if this expec- tation actually were observed, it is clear that of the 71 intervals which contained counts in numbers of greater than 13 some one of them would contain the greatest number. If n is 1.3 X 10 18 , we cannot expect that PROBLEMS 49 another 10 4 intervals would not include one which contained more than the previous maximum. One must always expect a difference between the experimentally observed distribution and the analytical distribution which is giving rise to the experimental distribution. Reference to the value of n also points out an obvious but in this case unimportant lack of rigor in the application of the Poisson distribution to an actual experiment in radioactive decay. If an observation of the 0.5 mg of radioactive metal were carried out for the 10 4 one-second inter- vals and if the distribution of counts were found to be in accord with the probabilities listed in Table 6-5, then n would have been reduced by at least 63953, which is the sum of the products r(P(6.5, r) for r = 1 through 13. Consequently, /x would not have been constant during the course of the experiment. Though the loss of 63,953 out of 1.3 X 10 18 would obviously be undetectable, we must remember that the above example was a special case. Most radioactive materials have a much higher value of p than that used here and the present-day scientists are able to manipu- late much smaller amounts of material than 1.3 X 10 18 atoms. PROBLEMS 1. Given P(n, 3) = 10 P(n, 2), find n. Answer: 12 2. Given P(n, r) = 272, C{n, r) = 136, find n and r. Answer: n = 17, r = 2 3. Five different positions are to be filled, and there are 20 different applicants each applying for any one of the positions. In how many ways can the positions be filled? Answer: 201/15! 4. In a certain town, there are 4 aldermen to be elected, and there are 8 candidates. How many different tickets can be made up? Answer: 70 5. In how many ways can a pack of 52 playing cards be divided into 4 hands so that each way will produce a different situation in a bridge game? Note that if any two players exchange hands a different situation is produced. Answer: (4!)(52!)/(13!) 4 6. How many different baseball lineups of 9 men each can be chosen from 15 players of whom 8 are qualified to play in the infield only, 5 in the outfield only, and 2 in any position (battery included in infield) ? Answer: 14 X (8!/2!) X (5!/2!) 50 PROBABILITY 7. From a bag containing 5 black balls and 4 white balls, three are drawn at random. What is the probability that two are black and one is white? Answer: 10/21 8. The letters of "front" are shaken up in a bag and three are drawn, one at a time. (a) What is the probability that they will spell "ton" as drawn? (b) What is the probability that the three letters can be arranged to spell "ton"? Answer: (a) 1/60 (b) 1/10 9. Find the probability of throwing one and only one ace in two trials with a single die. Answer: 10/36 10. Find the probability of throwing: (a) exactly 3 aces in 5 trials with a single die; (b) at least 3 aces in the 5 trials. Answer: (a) 250/7776 (b) 276/7776 1 1 . Make a table similar to Table 6-5, but for the probability of getting various numbers of counts in 2-sec intervals from the f-mg sample of 92U 238 . [Hint: Note that, for the Poisson distribution, (P(/x, r + 1) = bi/(r + 1)] &(n,r).} Typical answer: 0.0152 at r = 6, 0.1098 at r = 13 12. Reproduce the result obtained in Problem 11 at r = 2 by using informa- tion from Table 6-5. 13. Evaluate, for the Poisson distribution, log (P(100, r) for r every two units from 94 through 106, and plot the results against r. Considering the shape of the curve, plot the same results against (99.5 — r) 2 and evaluate a and b in (99.5 — r) 2 ' (P = a exp (Use Item C of Appendix 3 as a source for factorials and their logarithms.) Answer: a 1 0.040, b ± 2p (Note: If the work was done carefully, it will show up the inherent lack of symmetry in the Poisson distribution.) 14. Repeat Problem 13 for the binomial distribution (P(200, r) with p = q, except compare with (100 - r) 2 . Use log (200!) = 374.897. Answer: a = 0.0564, b = 2npq 15. Calculate the distribution to be expected with 6 tetrahedral "dice." Cal- culate the expectation value for any particular face by averaging directly over the distribution and compare with Eq. (6-5). CHAPTER 7 DISTRIBUTION OF CHANCE ERRORS The concept of distribution of probability was presented in the previous chapter. Such a distribution is a body of information by which one can know the relative likelihood of occurrence of events which differ in detail but are of a similar nature. The event, the appearance of one head in a toss of 10 coins, is different in detail from the appearance of two heads, but both of these are of a similar nature. On the other hand, one would not ordinarily describe by the same distribution such diverse events as a power failure in New York City and the impact of a meteor on the moon, even though a connection is conceivable. The "body of information" can take many forms as illustrated by Eq. (6-6), Table 6-5, and Fig. 6-3. Of these three the first contains the most information, and in fact, a mathematical description of a distribution is generally most useful and most convenient. The occurrence of an error of a particular size in a particular type of measurement is an event of the sort in which we are interested here. Consider the experiment used in Section 6-7 to illustrate the Poisson distribution. The object of such an experiment is usually the determination of the probability per second that a single atom will decay. To carry out the experiment, we must first weigh the sample, and next from the weight and molecular weight of the sample and Avogadro's number, find n. Then we must divide n into n to get the desired result. The estimation of fi must be made by dividing the total number of counts by the number of one-second intervals in which that number of counts was observed. In the language to be used through most of the remaining chapters of this book, an event other than the one sought is what we mean by the term "error" and the size of the error is the difference between the observed event and the desired (i.e., the average or, perhaps, the most probable) event. This suggests that we can obtain an error distribution curve by a translation of the axis along which events were plotted in Fig. 6-3 to a 51 52 DISTRIBUTION OF CHANCE ERRORS [7-1 position where the origin coincides with the desired event. This new curve gives us the error distribution, a picture of the relative probabilities with which different errors can be expected to occur. Thus in the example of the last chapter we can think of the occurrence of 11 counts in a one- second interval as a positive error and the occurrence of 3 counts as a negative error. The distribution which is most widely useful for the treatment of errors in scientific measurements is neither of the two introduced so far. We mentioned in Section 5-7 that when systematic errors have been reduced to a minimum, there will be residual errors for which no explanation can be found. Fortunately, these errors follow the laws of chance so that although they prevent us from having absolutely certain knowledge, a proper consideration of them will lead us to reliable knowledge about the quantities being measured, as well as giving us an idea of the extent of this reliability. In most cases the errors will be just as likely to be positive as negative, and will usually follow the so-called "normal-distribution law," to be introduced in mathematical form in the next chapter; its expected graphical appearance is illustrated in this chapter. However, there are types of measurements in which the chance errors are not nor- mally distributed; the distributions may be symmetrical or nonsym- metrical — that is, in the latter case the errors may be more likely positive than negative, or vice versa. Nonsymmetrical distributions are called skew. Skew distributions are difficult to treat, but are generally treated by reference to the normal-distribution law because distributions of averages of groups of measurements from non-normal distributions tend to approach the normal-distribution law as the numbers of measurements included in the groups increase [3]. This important fact will be considered more completely later. 7-1 EXAMPLES OF ERROR DISTRIBUTIONS To further understand the meaning of a distribution of errors and the properties of these distributions, it is worth while to perform a simple experiment in which a normal distribution is usually found. It is desirable in an experiment for this purpose that accuracy be more easily achievable in the reading than in the setting of the instrument. We shall describe two such experiments here: one is suitable for laboratory use in a formal class; the other can be done by one person with very simple equipment. For the first one, set up a lighted object and a lens corrected for chromatic aberration on an optical bench so that the image of the lighted object will be between 20 and 40 cm from the lens. The experiment should be performed by two people, an observer and a recorder. The observer moves the screen until the image on it appears sharp to him and the recorder 7-1] EXAMPLES OF ERROR DISTRIBUTIONS 53 estimates the position of the screen to the nearest yo mm ar >d records it. The observer then moves the screen well out of focus and oscillates it back and forth to find a new position, which the recorder records in the same way. This procedure is repeated until the desired number of readings are obtained. For the result to be useful, it is desirable to obtain at least 500 and preferably 1000 readings with equal care and skill. Ideally, of course, all measurements should be made by one person so that the same care and skill are applied to each reading. However, students in a class will usually be found to have nearly the same ability in focusing. Thus very useful results can be obtained by combining comparatively smaller num- bers of observations obtained by several observers. Each member of the class may act once as recorder and once as observer. It is necessary that every reading is recorded. Occasionally there will be a reading that seems quite far from the main group, and one may be tempted to conclude that it is a mistake and discard it. This should not be done, for reasons which we shall discuss later. We shall now describe a method of presenting the observations which is not only excellent for instructional purposes but is also frequently used for the formal presentation of scientific data. This is the construction of a histogram. We imagine that the scale on which the readings are taken is divided into intervals of equal length such that the locations of the division marks between intervals are specified to one more significant figure (with that figure being a 5) than is given in the readings. Thus if readings are taken to the nearest tenth of a millimeter, an interval may i — i — r i — i — r n i i i — i — i — r i i — r (MOO'* -+ CO o o o o o o o o o o i — r 00 ^h TjH o o o o o o ■14 ■12 10 -i n—i—n 8 10 FIGURE 7-1 54 DISTRIBUTION OF CHANCE ERRORS [7-1 lie from 688.25 mm to 688.75 mm, or from 688.25 mm to 689.55 mm. The intervals should be of such length that the one in which the maximum number of readings fall contains about 20% of the total. When all this has been decided, we count the number of readings which fall in each inter- val and calculate the corresponding fraction of the total. The fractions are then plotted as vertical bars at the proper intervals, where the height of a bar is proportional to the fraction, and the width of the bar is propor- tional to the length of the interval. A typical histogram is shown in Fig. 7-1. The procedure suggested for the less formal experiment is such that a histogram can be produced directly. The experiment is performed with a small reasonably heavy dart and a target which consists of 30 to 40 parallel lines ruled | in. apart on a piece of paper. The central line is marked zero and those to the sides marked 1, 2, 3, . . . and — 1, —2, —3, . . . The £ in. spacing constitutes the interval in the abscissa for the histogram. Thread Weight 5 g /# Length 9^ cm 7/ FIGURE 7-2 The authors used a commercially available dart, which is sketched in Fig. 7-2. It should be well sharpened and a thread should be cemented against the shaft. Holding the dart by this thread when aiming it for the drop will produce a well-formed distribution more reliably than if it is held by the vanes. The experiment is performed by laying the target on a thick pad of newspaper on the floor with the lines stretching away from the experi- menter. It should be laid so that when the dart hits, it will hit over a place in the newspaper pad, near a fold, for instance, where there are air layers between the pages. If the dart hits a well packed pad, it will tend to bounce and the place of striking will be missed. By the time one is approaching the second hundred drops, this sort of behavior can be most exasperating. The reader should also be cautioned to always keep the algebraic signs of the coordinates in the target oriented in the same way. The dart is to be held by the thread above the target, and about 20 to 25 cm below the eyes while standing erect. With both eyes open, the experimenter aims the dart at the zero line and drops it. The interval in which the dart lands is the datum. One may well find that the peak of the distribution is to one side or the other of the zero line. This is the reason that mainte- nance of a constant orientation is necessary. 7-2] CHARACTERISTICS OF AN ERROR DISTRIBUTION 55 We repeat here even more strongly the earlier caution about the tempta- tion to discard the occasional widely dispersed readings. On the other hand, the dart may at times drop accidently from one's nervous fingers before it has been aimed. Such a drop should not be counted. Finally, from time to time it will appear at first glance that the dart has landed exactly on one of the lines. Careful examination will provide some basis on which to decide in which interval to include the hit. 7-2 CHARACTERISTICS OF AN ERROR DISTRIBUTION The following discussion applies to either of the two experiments above, as well as to some more involved ones to which the ideas being introduced here will be extended later. It is convenient to refer specifically to one of the above experiments, and we shall choose the second. The object of the experiment was to determine the likelihood of hitting what one aims at. If the reader is like the kind assistant* who performed this experiment with Fig. 7-1 as the result, he will find that he is not at all likely to hit exactly ivhat he aims at. On the other hand, the histogram clearly shows that there is some chance of hitting within certain bounds on each side of what is aimed at. The farther apart the bounds are, the more certain is one to drop the dart between them. Furthermore, it is clear that it is highly unlikely that a repetition of the experiment would reproduce Fig. 7-1 exactly. It is likely (though not certain) that the reader will find his distribution to be symmetric. If it is not exactly so, at least it will appear that it can turn out to be symmetric if more readings are taken. Although in most of the measurements that are made of physical properties the chance errors are symmetrical, there are types of measurements in which such symmetry should not be expected. A simple example will help to clarify this point. Consider the following modification of the optical experiment first proposed for illustrating the normal distribution of errors. Instead of keeping the lighted object and the lens fixed while observing the location of the screen for sharpest focus, we can keep the object and the screen fixed and observe the location of the lens. This experiment is most effective when the distance between object and screen is slightly greater than four * The authors would like to express their appreciation of the efforts of Jeanne Meisel. She dropped the dart well over the recorded 500 times. She took over the job as an inexperienced "guinea pig" to test the instructions given and to see whether she would produce a reasonably normal distribution. It is interesting and instructive to students that she had to learn to aim — her first 100 drops produced a wide, flat distribution. She started recounting when told that while any such distribution was worthy of discussion, hers was not what was wanted for the present purpose. 56 DISTRIBUTION OF CHANCE ERRORS [7-2 times the focal length of the lens; e.g., the object and screen being 1 m apart for a lens having a focal length of about 24.75 cm. Such a lens has two positions at which it makes a sharp image of the object : about 45 and 55 cm from the lighted object. Obviously, the image will go out of focus more rapidly when this object distance is reduced below 45 cm than when it is increased above 45 cm. Careful observations of the lens position will show a greater spread of readings above than below 45 cm, and the distri- bution of chance errors will therefore be nonsymmetrical or skew. Following is a summary of the characteristics of an error distribution: (a) Not every distribution is symmetric. (b) The degree of certainty with which an experimentally measured quantity is known is itself uncertain, as is the quantity. (c) Every additional reading changes, however slightly, the picture of the distribution of the errors. We recall that (b) and (c) are restatements of the discussions in Chapter 6, where we distinguished between the expectation values of the binomial and the Poisson distributions and the results one might get by averaging the observations from some finite number of experiments. When the distribution is not known analytically in the sense of Chapter 6, the ex- pectation value is not known. It is important then to emphasize this distinction between a hypothetically "true" value for a measured number* and the value which, on the basis of the readings at hand, appears to be most likely. It is convenient at times to be able to discuss the "true" errors, which are the differences between the readings and the "true" values. But since the "true" value cannot be determined with certainty, the only errors which can be handled numerically will be what are called residuals — the differences between the readings and the value that those readings indicate to be the most likely. There are two important implications in the above remarks. One is that the readings must be used to determine the best value of the quantity being measured or of some function of this quantity, as well as to determine the error distribution. If the "true" value were known, the readings would only have to be used for the determination of the error distribution. The error distribution cannot be determined as well in the first case as in the second since more information would be available in the latter. * The question of "true" value does not arise nearly so obviously in the dart- dropping experiment as in the lens image experiment or in an experiment using a caliper, for instance. But even in the latter cases, a consideration of the differing effects of the temperature on different metals, of the breadth and roughness, on a microscopic scale, of the scale division marks, etc., tends to make these "true" values almost as ephemeral as the value of the number one hits when he aims a dart at zero. 7-2] CHARACTERISTICS OF AN ERROR DISTRIBUTION 57 The other implication is that an error distribution must exist inde- pendently of the readings taken. The existence of such a distribution is less ephemeral than a "true" value of the quantity being measured. Things such as temperature or roughness of scales, which tend to make the existence of a "true" value unlikely, are just the ones that establish the existence of an error distribution. It is to be considered then that the readings taken are a sample of the infinite number that could be taken, just as one might check only a sample of wheat from a large storage bin. One does not examine every grain; he infers the general condition of most of the grains from appropriate samplings. It should be clear now why the reader was cautioned not to discard the occasional readings which seem so far from what they "should " be. Perhaps they are exactly what they should be for reasons pointed out in the dis- cussion of Table 6-5. There it was clear that the probability of observing some number of counts per second that departs significantly from the expectation value is low. Nevertheless there ivas a finite probability of observing these numbers of counts, and especially when a large number of observations is made, they are to be expected to turn up occasionally. The same reasoning applies here. In later chapters, we shall take up the matter of discarding readings, but we will show that this may be done only after careful consideration of all the readings. That is, the question of how improbable it might be that a particular reading belongs in the universe being sampled must be answered by an examination of the set which includes that reading. It may well be that such examination will indicate that it is highly improbable that the universe being sampled contains that reading, in which case it may be discarded. There is a related remark to be added here. When a series of observations has been made to determine certain quantities, it is desirable to know the uncertainty in those numerical values which are rendered most probable by the existence of those particular observations. With one unknown, it is sometimes assumed that the difference between the largest and the smallest readings is the only safe, and consequently, sensible measure of that uncertainty. If the reader has performed either one or both of the experi- ments described, he will see that this is nonsense. It is clear that this difference can never be decreased by added readings; it can only increase or remain the same by the addition of more work. The only logical con- clusion from such an assumption is to take only one reading on any par- ticular quantity. While we would always like to quote a greatest error, we will usually find this hope frustrated. But even if such an error were found, it would 58 DISTRIBUTION OF CHANCE ERRORS be of little value if it was very large and of rare occurrence. What is of value is the shape of the error distribution curve, particularly within some reasonable range of its peak which might include from 70 to 90% of the number of readings that are taken. After some significant number of readings have been taken our knowledge of this shape is not altered greatly as the number of readings increases further. On the other hand, the addition of readings does continually increase the certainty with which the location of the peak is known. These qualitative statements will be made quantitative in the following chapters. PROBLEMS 1. In the dart-dropping experiment which yielded the histogram of Fig. 7-1, it is supposed that the readings can be taken only to the nearest unit, so that all readings that fall between 5 and 6, say, are read as 5.5. (a) What is the probability that a single reading lies between the limits ±3? (b) What is the probability that a single reading lies outside the limits ±3? (c) What is the probability that four observations are —2.5, —0.5, +0.5, +2.5, in that order? (d) What is the probability that four observations are —0.5, +0.5, —2.5, +2.5, in that order? (e) Hence what is the probability that four observations have the values given in parts (c) and (d), regardless of the order? Answer: (a) 0.616 (b) 0.384 (c) 8.43 X lO" 5 (d) 8.43 X 10~ 5 (e) 2.02 X 10~ 3 2. It is found that in a certain type of measurement the chance is 0.5 that any reading will fall in region A and 0.25 in region B. If three readings are taken, what is the chance that (a) they all fall in region B, (b) the first two fall in region A and the last in B, (c) they all fall in either region A or B1 Answer: (a) (0.25) 3 (b) (0.5) 2 X (0.25) (c) (0.75) 3 3. Perform an experiment similar to the dart-dropping experiment or the focusing experiment described in Section 7-1 and answer questions like those of Problem 1 above. CHAPTER 8 THE NORMAL-DISTRIBUTION FUNCTION It was mentioned in the previous chapter that the histogram is excellent for instructional purposes. Its appearance suggests the existence of some continuous mathematical function, called the (real) error-distribution function, which describes the probability of occurrence of particular readings. This function /(£) has the property that the probability that the error corresponding to a particular reading will lie in the interval between x and x -\- Ax is rx-\-Ax <P(x < $ < x + Ax) = / /(£) d$. (8-1) J X This probability is represented by the shaded area in Fig. 8-1. 8-1 THE NORMAL -DISTRIBUTION FUNCTION There are many conceivable forms of error distribution, as illustrated in Chapters 6 and 7. It is our object in this chapter to find and examine the properties of a distribution function that will describe the results of experiments having error distributions similar to that of the dart-dropping experiment of the previous chapter, i.e., distributions having the general form shown in Fig. 8-1. The principal properties of this form suggested by the histogram of Fig. 7-1 are symmetry of the distribution about zero error and the improbability of very large errors. As discussed in Chapters 6 and 7, this does not mean that very large errors can be known to have no probability of occurrence but only that they are very improbable. The smooth curve described by the mathematical form, or function, to be discussed in this chapter is called the normal- or Gaussian-error curve, and errors which are described by it are said to be normally distributed. It is generally assumed that there are real error distributions which have the form to be given here. There have been experiments designed to verify this assumption [4] ; those of the preceding chapter were in fact so designed. But the verification is valid only within the bounds of another assumption, that the distribution obtained with very large numbers of 59 60 THE NORMAL- DISTRIBUTION FUNCTION [8-1 fig. 8-1. The solid line represents the error distribution function; i.e., the ordinate gives the probability per unit error, as a function of the error, for the occurrence of an error of the size of the corresponding ab- scissa. Thus, the shaded area will represent the probability that an error will be found to lie between x and x + dx. x x+ Ax- readings approaches the "true" distribution. Although satisfactory agree- ment is generally obtained in such trials, it must be pointed out that the working physical scientist usually cannot afford the time, nor does he have the inclination, to go through the boring task of proving that his errors do follow this distribution by always taking 500 to 1000 readings. However, the consequences of assuming that they do are generally safe and certainly safer than those of making some highly unlikely assumptions, of guessing at some measure of the error distribution function, or of doing nothing at all about describing the error distribution. Various arguments have been used as a basis for a "derivation" of the normal-error distribution. Each of them is open to some sort of criticism, particularly by adherents of other methods or, especially, of other distribu- tions. Some of the derivations [5] are based on assumptions which are attempts to reflect physical conditions. Such a derivation arrives at the desired result with the consequence that the best value for a series of measurements on a single quantity is the arithmetic mean. Other deriva- tions [6] accept the latter as an axiom. Many authors prefer to adopt the normal error distribution without derivation, as the result of experience. A derivation will be given here; the only claim we make for it is that it is illustrative of the arguments customarily used. It was emphasized earlier that not all observations follow the normal- error curve. Hence we might question the validity of attempting to imagine and use a "model of physical conditions" in a derivation of the normal- error curve. Actually it is enough to demonstrate that there are measure- ment procedures that yield data which follow the normal law with satis- factorily high probability, and that it cannot be shown that most of the 8-1] THE NORMAL-DISTRIBUTION FUNCTION 61 data which are actually collected in everyday practice do not follow such a distribution. Consequently, to restate what we said earlier, the normal curve represents as good an approximation as is generally available. The characteristics of the relationship between an experiment and the error distribution affecting that experiment were described in Chapter 6, and in particular, in Section 7-2. It is assumed that the error distribution in question exists independently of the particular set of readings which are taken, and that the (P's of Eq. (8-1) have a priori but unknown values.* Since the (P's are unknown, we can only attempt to derive a function that will closely describe the readings obtained and has the properties men- tioned. The derivation will be based on an attempt to associate the experimental readings with a distribution having the general form of Fig. 8-1 in such a way as to make the existence of the set of readings at hand more probable than for any other set of that size. This means that we must deduce /( £) in the form of an equation which will accomplish this purpose. It is clear that this form of /(£) implies the existence of a most probable value which is the same, in this case of a symmetric function, as the expectation value. Since the latter is not known, we let it be represented by M , to be derived from the n readings Mi, i — 1, . . . , n, and let Vi = Mi - M , (8-2) called the residuals, represent the true errors. It is the necessity of deter- mining M from the values of M ,■ that represents the loss of information referred to in Section 7-2. While it is not known that M corresponds to the origin of the horizontal scale of Fig. 8-1, the difference between two residuals, being just the difference between the corresponding readings, is nevertheless the same as the difference between the unknown true errors which correspond to those readings. Therefore increments on a scale of v are the same as increments on a scale of £. Let Av (=AM) be so chosen that there is no more than one of the total of n readings in a single interval of width Av. In Section 6-3 it was shown that the probability of the occurrence of n independent events is equal to the product of the individual probabilities of occurrence of the individual events. Thus the probability of obtaining a particular set of n observations can be written as n rvi+Av/2 P =IL /(*) <** , (8-3) i=l J Vi-Av/2 * Not only is the real error distribution unknown, but so also is the real error corresponding to a particular reading. 62 THE NORMAL-DISTRIBUTION FUNCTION [8-1 where the notation -Jj i=l denotes the product of the factors with i = 1, i = 2, etc., up to i — n. For each i = 1, . . . , n, the integral represents the probability of obtain- ing one of the readings so that Eq. (8-3) is the probability of observing the complete set. In the limits of integration, Vi is the abscissa at the middle of its interval. If /(£) is expanded near £ = Vi by the use of Taylor's theorem, it becomes df /(*) - m + d ft u 1 d *f (« - ».-r + When this expansion of /(£) is used as the integrand, we find that, for small Aw, powers of Av greater than 3 are so small that they can be omitted. Then Eq. (8-3) becomes p = ii Kvi) av. i=l Here, a factor Ay appears for each /(«,-) ; there is no need for every interval to contain a reading. In this discussion we are attempting to deduce the form of the function / that will yield a prediction that P is a maximum for the readings ob- tained. This problem is different from the determination of those residuals —that is, the determination of the value of M — that will produce a maximum value of P when / is known. The value of In P reaches a maximum when P does, and In P is more convenient to use. Thus In p = J^ In /(«,-) + n In Av. (8-4) In this expression /(«,■) varies with i. We wish to know what form of dependence of / on its argument will make In P a maximum: that is, for what value of In /(«,-) is 8 In P, the variation in In P, zero. The value of In f(vi) varies with varying values of V{. Thus and we wish to have >W') = J% Svi, ^ f dv 8v { = 0. (8-5) If the individual residuals were all independent of each other, this requirement could be easily fulfilled by individually setting each of the 8-1] THE NORMAL-DISTRIBUTION FUNCTION 63 8vi, except 8vj say, equal to zero and then setting (l/f)(df/dv)\ Vj equal to zero, a process that could be carried out forj = 1, . . . , n. However, the value of M , whatever it may be, is common to all the residuals, so that they are not all independent. Thus the variations considered above must be carried out in a manner consistent with the constancy of some other relation between them. It is easy at this point to also include another requirement, that the distribution function be symmetrical. If f(v) = f(—v), we can describe the situation by considering / as a function of an even power of v; we choose v 2 . Thus the variation in the individual values of In / must be carried out to satisfy y: »? = c where C is some constant. Therefore, in addition to Eq. (8-5), the varia- tions in the residuals must also satisfy the equation ZN Svi 0. (8-6) It is certainly true that if / depended on any even power of v, it would be symmetric about v = 0. The second power is the simplest that satisfies the requirement ; more importantly, it also leads to the normal distribu- tion, and as we mentioned earlier, most experiments not known to obey some other distribution such as the Poisson have been found to obey the normal distribution when very large numbers of readings have been taken in order to test this point. Suppose now that Eq. (8-6) is solved for the variation in one of the residuals, 8v\ for instance. Then OVi = 8v 2 8v 3 — • • • Vi Vi If we substitute this into Eq. (8-5), we obtain \d[ f dv — 8v 2 -\ 8va ldf f dv 1 df 8v 2 + 7 , v 2 f dv 8v:- 0. n Since one of the variations, 8vi, has been eliminated corresponding to the one fixed relation which must exist between the 8v{ (because M is com- mon to all the residuals), the remainder of the variations are independent of each other. Thus the above equation can be satisfied by setting the coefficients of the various 8vi equal to zero individually. Therefore we have v\f dv v 2/ dv _L §£ V\j dv v 3 f dv etc. «3 64 THE NORMAL-DISTRIBUTION FUNCTION [8-1 But these relations say that regardless of the particular residual v i} i > 2, for which it is evaluated, (yf)~ l (df/dv) must always equal some constant, which happens to be written here as (vf)~ x (df / dv) evaluated for v x . It is convenient to rewrite this constant as —2h 2 . We then conclude that the function / must have a form such that )%=-*: (8-7) It is clear that the constant must be negative in order that (P(v < £ < v + Av) decreases as v 2 increases; reference to Fig. 7-1 or Fig. 8-1 shows that when v is positive, df/dv is negative, and vice versa. Integration of Eq. (8-7) yields / = h e ~ h2v2 , (8-8) where k is an arbitrary constant of integration. Equation (8-8) will represent a smooth, continuous mathematical func- tion if the same constant of integration can be used to relate / and v 2 for every v 2 . That distribution function which is called the norwaZ-distribution function of real errors is then the function of x 2 which results when the same value of fc is assumed to apply to all observations; that is, the normal distribution function is / = k e - h2x2 (8-9) where k, like h 2 , is a fixed constant for the distribution. We remember from Chapter 6 that the sum of the probabilities of every conceivable event covered by either the binomial or the Poisson distribu- tion must be unity. The same thing must be true here. The ooly difference is that, with this smooth mathematical function integration replaces the process of summation used previously, and as we see from Eq. (8-9), the range of events is from — go to +cc . Therefore ke -h^ dx= L (8 _ 10 ) Since this integral can be evaluated, we can obtain a relationship between the constants k and h. In Appendix 1 it is shown that J* e~ t2 dt is equal to y/r/2. Hence 1 = -r- -=- or k = — — • (8-11) ft Z V7T 8-1] THE NORMAL-DISTRIBUTION FUNCTION 65 Thus the normal-distribution function can be written as /(*) h -h 2 x 2 (8-12) which now contains only one adjustable constant h which determines the precision of the observations. Figure 8-2 shows graphs of the distri- bution function for different values of h. We note that the y-intercepts are h/y/lr. Therefore the maximum heights of the curves are proportional to h. Since the area under each curve is one, the curves having high intercepts on the y-axis are very narrow corresponding to a very narrow distribution of errors, whereas those with low ^/-intercepts have widely distributed errors. The number h is called the "measure of precision" since a large h means a narrow distribution of chance errors. figure 8-2 It was mentioned in the introductory remarks to the above deduction that the most probable value of an observation was the same as the expec- tation value when the distribution is symmetric. We demonstrated this earlier with a symmetric binomial distribution; we can also demonstrate this here. The importance of this fact is emphasized by recalling that the expectation value is equivalent to the arithmetic mean. In Chapter 6 we obtained the expectation value for an analytically known distribution by multiplying an event, or a reading, by the proba- bility of its occurrence and taking the sum of this product over all possible events. In the present case, summation means integration. If M is the most probable reading, then some other reading is Mi = M + Xi 66 THE NORMAL-DISTRIBUTION FUNCTION since by definition X{ is the true error in M{. Then the expectation value /x is /* = 4= / (M + x) e~ h2 * 2 c?x. (8-13) Because of the symmetry of the distribution, the integral involving x as a factor is zero; the positive values just cancel the negative values. From Eqs. (8-10) and (8-11) then we obtain M = M . (8-14) Figure (8-2) and the accompanying discussion pointed to h 2 as a measure of the width of this distribution; but as with the binomial and Poisson distributions, we shall defer further discussion of this most important topic until Chapter 9. Here we shall confine ourselves to an illustration. 8-2 NORMAL DISTRIBUTION: AN ILLUSTRATIVE APPLICATION Either of the two experiments described in Chapter 7 will serve as a direct illustration of the normal-distribution function. We shall use the data given in Fig. 7-1 in the following comparison; readers should similarly work up their own data. First, we note that the height of each bar in the histogram of Fig. 7-1 represents the probability that the dart will fall between two adjacent 10 20 30 40 50 60 70 80 90 100 11 -1 1 1 ... ! 1 T I i i -2 -3 > • ^» - • ^ -4 • ^^s • • -5 • • * -6 — 7 - • s^« • • - FIGURE 8-3 8-2] AN ILLUSTRATIVE APPLICATION 67 lines separated by a distance Ax, which is unity in this case. It has been frequently mentioned that one can never expect to define a distribution exactly with a finite number of readings. Anticipating the discussions of Chapter 10, we must fall back on Eq. (8-13) to proceed with the illustration; i.e., we consider all the observations which fell in a particular interval to have fallen at the center of that interval, and the ratio of the number of such observations to the total, 500, to be the probability of obtaining an observation in that interval. When we take the sum according to Eq. (8-13), of the products of the readings and their probabilities of occurrence, (0.5)(0.096) + (1.5)(0.110) H h (-0.5)(0.140) + (-1.5)(0.108) + • • • , we find that the expectation value is —0.782. Thus the error at a reading of 6.5, say, is 7.282. For purposes of illustration, we will assume that the true distribution has been demonstrated by the 500 readings that were taken and that this distribution is normal. Thus we will use the designation x, which was reserved for the true error, to describe what are actually residuals. We are therefore to examine (P(x < £ < x + Ax), which it is convenient to call AP. Here, Ax is sufficiently small for our purpose that we can write* r,2„2 AP = f(x) Ax = -^= e~ h2x ~ Ax. V7T As in our deduction of the form of the normal distribution we find it most convenient to use the equation InAP = \nf(x) = ln-4z - h 2 x 2 , (8-15) since In (Arc) = 0. The data are plotted according to this equation in Fig. 8-3. Before discussing the considerable scatter which appears in the figure, we shall make some further rudimentary numerical analysis — rudimentary since we have not yet introduced the proper procedures for calculating those values of the parameters which are indicated by the data to be most probable; the parameter in this case is h 2 . The intercept on the lnf(x) axis is clearly about —2.1, which according to Eq. (8-15) should be the value of \n(h/y/ir). When this equality is assumed, the value of h 2 turns out to be 0.047, which is the negative of the slope in Eq. (8-15). This was the method used to find the line drawn in Fig. 8-3; it is a pretty fair representation of the data so long as x 2 is not too large. * The correlation of the observed values of AP with f(x) will be made more precisely in Chapter 9. 68 THE NORMAL-DISTRIBUTION FUNCTION [8-3 We can assume that the considerable scatter is to some extent evidence supporting the often repeated warning that no finite number of observa- tions can determine an analytic distribution exactly. In this case, it is likely that a more important source of the scatter is that there is no fixed analytic distribution governing the observations. Certainly the observer did not make all 500 observations at one time, and variations in attentive- ness to aiming the dart, the state of her digestive processes, etc., all tended to shift the distribution. In fact, resting between groups of observations may well have produced a more sharply defined distribution than if all 500 observations had been taken at one time. Special attention should be paid to the fact that there is more serious departure from the straight line at large values of x 2 . The values of In f(x) for large x are seen to lie above the line; this points to another problem, that of fractional observations which, of course, do not exist. When n observations are made under the "control" of a distribution/^), the number of observations expected in an interval Ax at x is nf(x)Ax, for small Ax. If this number turns out to be 0.4 at some x, and an observa- tion falls in that interval, then the specification of a distribution being deduced experimentally is in error by a factor of 2.5 at that point. This error cannot be corrected unless an additional 1.5n observations are made without another occurring at that point. We shall discuss this problem further in later chapters, where the data of Figs. 7-1 and 8-3 will serve as a continual source of examples. 8-3 POISSON DISTRIBUTION FOR A LARGE EXPECTATION VALUE Comparison of the Poisson distribution (P(6.5, r) shown in Fig. 6-3 with the histogram of Fig. 7-1, the idealization of the latter shown in Fig. 8-1, and the normal distributions of Fig. 8-2 yield some suggestive similarities. The distribution of Fig. 6-3 is reasonably symmetric for a small region about the most probable value of r, and the latter value is close to the expectation value 6.5. In this section we wish to examine these similarities by making an approximation to the Poisson distribution for a large expec- tation value. Since the principal emphasis of this book is on the estimation of expectation values for observations which are expected to follow the normal distribution, we shall make the approximation to the Poisson distribution as though the result were expected to become symmetric about the expectation value as the latter became larger. That property of the Poisson distribution which will prove this hope to be false could be demonstrated immediately, but we believe that the following order of presentation is more instructive. While not absolutely necessary at this point, it is convenient to use and accept without proof a formula obtained by more advanced mathematical 8-3] poisson distribution: large expectation value 69 methods than are generally being used here. This formula says that for large r, r! = r r e- r V2Trr (8-16) is a good approximation [7]. As stated in the preceding discussion, the region of our interest is around the expectation value. We shall keep it in view even while trans- lating the origin to ju, by letting x = r - /*• (8-17) Then x, which is the difference between a particular reading and the expectation value, plays the role of the error. If Eq. (8-17) is used in Eq. (6-6), the latter becomes ^ X) = WT^\^- (8 ~ I8) This is to be approximated for the case where // is a large number but x is not. In particular, the approximation is not expected to be good for negative values of x with absolute values comparable to /x. Under the allowed circumstances, n + x will also be large, and Eq. (8-16) can be used in Eq. (8-18). After some obvious algebraic manipulation the latter becomes V2¥u V 1 + *//V V2ttix \M + X J Vl + x/fx which can be further reduced to / i y+x+i/2 '2ttm V 1 + x ^) As is frequently the case, it is more convenient to work with In (P; In (P(ju, x) = - | In 2ttm + x - (u + x + In M + -V (8-19) If jjl is large, then x/n is small, and the last term can be expanded as follows : When this result is inserted into Eq. (8-19), the leading x is cancelled, 70 THE NORMAL-DISTRIBUTION FUNCTION [8-3 and the result is ln<P( M ,x) = -|ln(2*r/*) - |- - - | x + terms involving - raised to powers of 2 and greater. M (8-20) We see that if we set *' = £, (8-21) the first two terms of (8-20) represent a normal distribution. Furthermore, the terms in the higher powers of x/n can be neglected when fx is large and x « n. However, the term %x//x presents a problem when compared with \x 2 lix for any of the smallest allowed values of x, i.e., for small integers x. The problem arises from the fact that, for the Poisson distribution, the most probable value does not continuously approach the expectation value as the latter becomes large; the distribution does not become sym- metric to a sufficient extent that, as in the normal distribution, there is a single most probable value of r which becomes coincident with the expec- tation value. We can see this by a more careful examination of Eq. (6-6). Let us suppose that r is the single most probable value and that it is different from fx by an amount e: r = /x + e. By the definition of r , (P(r ± 1,/x) < (P(r ,/*). Then <P(r + 1, n) = n r ° +1 u u r ° (r +l)! r +lr ! " rtVr m"\ r + l (0,M) - It is required that " < 1 r + 1 ' that is, ix < ju + e + 1, or < e + 1. Also, rt>fr 1 u\ // °~ 1 "-* - f ° ^ <>- M - ~ <* tfvo — i, m; — /_ l r o - 1)! ai r ! M Furthermore, ra <i, M or ;u + e < ju, or (8-22) (P(r , m)- e < 0. (8-23) PROBLEMS 71 The inequalities (8-22) and (8-23) can only be satisfied when -1 < € < 0. Now u — u. a — 1 — u (P( M , /*) = -^p = - ^^jyy = <p(m - 1, m), and since it has just been shown that /jl — 1 < r < fx, we choose to try We now change the definition of the "error" from Eq. (8-17) to y = r — r = r — n + % = x + %, so that we can substitute x = y — £ into Eq. (8-20) and ignore the higher order terms to get lnd>(M,») = -iln(2» M )-|]+^- The term 1/8/x represents a negligible error in the total probability. If (P(/z, 2/) is integrated over all values of y from — oo to +oo, the result is 6 i/8m ra ther than unity. Since ju is very large in this discussion, this result is only different from unity by about 1/8/jl. Furthermore, it can be shown that a slightly improved version of Eq. (8-16) will lead to a reduction of this last constant term to 1/24/i. Thus, the Poisson distribution for large values of /jl, when written in terms of the difference between the observations and the most probable observation, approximates a normal distribution about this most probable observation with a precision index whose square is the reciprocal of twice the expectation value of the Poisson distribution.* PROBLEMS Equation (8-3) gives the probability of obtaining that particular set of distinct readings that were found. Show that if this probability had been written for the same readings, but without regard to the order in which they were obtained, the arguments of Section 8-1 would have led to the same result. * One sometimes sees the above discussion conclude with n replaced by tq. Careful analysis with the improved version of Eq. (8-16) mentioned above shows that the result obtained here yields a slightly more accurate value for the total probability, but in either case the difference is negligible when ju ^>> \. 72 THE NORMAL-DISTRIBUTION FUNCTION 2. The line drawn in Fig. 8.3 has the equation lnf(x) = -2.1 - 0.047a; 2 . By using this instead of the actually observed probabilities from the histo- gram of Fig. 7.1, answer the questions of Problem 1 of Chapter 7. Note that x here is the error as described in Section 8-2. Answer: (a) 0.631 (b) 0.369 (c) 1.088 X 10 ~ 4 (d) 1.088 X 10 ~ 4 (e) 2.61 X 10~ 3 3. If one had a distribution defined by the values of /(2.5), /(1.5), etc. found in part (a) of Problem 2 and / = for all other values of x, what would be the expectation value? Note that the values of / are now only relative probabilities. Answer: /x = — 0.193 4. Apply the arguments of Section 8-3 directly to the binomial distribution for p = q and large expectation value n, to find the distribution approached under these circumstances. Keep terms only up to x 2 as in Section 8-3. Answer: (P = (I/VtFJJl) e~ x21 " CHAPTER 9 MEASURES OF SPREAD When we make a series of observations of some quantity, our principal object is to determine its most probable value or its average value; as we have seen, these two values are not always the same. It is also desir- able, however, to be able to describe the degree of certainty with which these values are known. This degree of certainty is determined by some measure of the width of the probability distribution, i.e., the spread in the observations. Several quantities have been devised as measures of spread, and we shall describe them in this chapter with the aid of ana- lytically known distributions. These quantities are defined as numbers which a reading has some specified probability of exceeding. This proba- bility may be 1 in 100, 1 in 20, 1 in about 3, or even 1 in 2. The last-mentioned measure of spread, a number that has one chance in two of being exceeded, was very popular a few years ago and is still used by a number of investigators. This number is known as the probable error, which is an unfortunate choice of name because it suggests to the uninitiated that this is the most probable error. But although such an interpretation leads to confusion, it is not desirable to change the name here since it is found in so much of the literature. When one considers measures of spread, it is important to distinguish between the spread of the individual readings and the uncertainty of the most probable value or of the average or other result to be determined. It was pointed out in Section 7-1 that averages of groups of readings tend to be normally distributed. This means that if readings are picked at ran- dom from a normal distribution and collected in groups of equal size, the averages of these groups will be found to be normally distributed. In fact, even if the infinite parent distribution is not normal, the averages of groups of individual observations collected at random tend to be more and more normally distributed as the numbers of readings in the groups increase. The distribution function for these averages will always be narrower than that for the individual readings. If h is the constant in the distribution function for individual readings, then h , the constant in the distribu- 73 74 MEASURES OF SPREAD [9-1 tion function for the averages, will be larger than h. Measures of spread are needed for both of these distributions. We shall consider the distri- bution for individual readings in this chapter. By far the most important of the measures of spread is the standard deviation. It will be described first; it will also be the only measure of spread described for distributions other than the normal one. 9-1 THE STANDARD DEVIATION The standard deviation a is best defined by the following equation: o- 2 = lim 2^- • (9-1) n — >t» il That is, a is the square root of the average of the squares of all the errors described by the analytic distribution; in particular, these are errors measured from the expectation value. The expression for a 2 in a binomial distribution can be deduced by the same method as used in Chapter 6 to deduce the expectation value }x (=np). If the error is fi — r, then the probability of occurrence of this particular error is C(n, r)p r q n ~ r . As described in Chapter 6, the average of a particular quantity for an analytically determinable distribution need not involve division of the sum over a frequency distribution by the number of observations. The limiting result for this latter process is obtained by summing or integrating the product of the quantity to be averaged and the distribution, over the range of the distribution. Hence <r 2 = E o* - r ) 2( ^> r )pV r=0 which becomes ~2 2 ■ft, IV 2 M J2 rC(n, r) V r q n - r + ^ r 2 C(n, r)p r q r r=0 when we expand (//. — r) 2 . The first sum which appears in the above expression is seen to be jx. By applying the procedure used in Chapter 6 to the second sum we see that ^ y + 1 n(l\) v+ii-y _ yp V_ <l\) v+i i-i , ^ ^T(f^W. p q ~ v ±* vni - v)i p q +M ' Repetition of the procedure on the remaining sum shows it to be np(lp) = np 2 (n — 1). 9-1] THE STANDARD DEVIATION 75 When all terms are collected, we find that <j 2 = ix 2 — 2fx 2 + m 2 — np 2 + /x = np(l — p) = npq, (9-2) or for the binomial distribution, a = \/npq. (9-3) We recall from Chapter 6 that the Poisson distribution was obtained from the binomial distribution by considering the case where n became very large and p very small while the expectation value np retained some fixed, finite, nonzero value. Hence we can determine a 2 for the Poisson distribution by examining Eq. (9-2) under these circumstances. The first three terms, each of which has a definite, finite value, mutually cancel. By the postulated conditions, the term np 2 = (np)p = up goes to zero. Hence for the Poisson distribution, a = s/np = y/jj,. In the case of the normal distribution, integration replaces summation, but otherwise the procedure is exactly the same. That is, a 2 = P x 2 A=e- h '' a dx. In Appendix 2 we show that the value of the integral is \/lr/4h 2 . Hence 2 1 1 0.707 ,. ., 2h2 hV2 h While we have not discussed the rectangular distribution, it is easy to insert here an evaluation of the standard deviation for this not uncommon case. The rectangular distribution is one in which no readings occur outside certain limits, ±d/2 say, but all readings within these limits are equally probable. Then 2 \ m or a = -, I x dx a Jo 2V3 76 MEASURES OF SPREAD [9-2 9-2 PROBABLE ERROR The probable error may be defined as that error which divides the area of the positive side of the normal-distribution curve into two equal parts. To state it in another way, there is a 50% probability that an observation will produce an error of absolute value less than the probable error. For the normal distribution the probable error p can be defined by the integral —p V7T ^ From this definition, we can obtain p as a function of h. Changing the variable as in Appendix 1 to let t — h£, we obtain • hp /2 e-* = 1 - ,4 ,6 ^2! 3!^ hp e~ '* dt = hp — (hp? (hp) 5 3 ' 2!5 e at = — — > 4 which can be integrated by expanding the exponential in a power series. Since e M =l+w + ^ + ^ + ---> we have and This series converges very rapidly so that an approximate value can be obtained from a small number of terms. We can use a method of suc- cessive approximations by writing the equation as np ~ 4 ^ 3 2!5 ^ where the first approximation is hp = \/tt/4i. A more exact value may be obtained by substituting this first approximation into the terms on the right-hand side of the equation. Thus we find that hp = 0.4769, 0-4769 , Q _. the probable error is inversely proportional to the precision measure h. From Eqs. (9-4) and (9-6) we find that the relation between p and a is p = 0.6745O-. (9-7) 9-4] MEANING OF THE MEASUEES OF SPREAD 77 9-3 AVERAGE DEVIATION Another measure of spread is sometimes used because it can be calculated rather quickly, although it is itself subject to large error. This is the average deviation a.d. defined as a.d. = lim • n— >oo fl In other words, it is equal to the average of the absolute values of all the errors. To avoid the difficulties involved in taking absolute values mathe- matically and since the curve is symmetrical, we make the calculation only for the positive side of the curve, i.e., from x = to x = oc. The area under the distribution in this latter range is just half that under the entire curve. Thus the average deviation is given by a.d. = 2 / x4 e~ h2x2 dx. JO V7T Since d(h 2 x 2 ) = 2h 2 x dx, the average deviation can be written as nJ 1 [" -h*x* Mlt 2 2, 1 0.5642 a.d. = — — e d(h x ) = — — = — (9-8) From Eqs. (9-8) and (9-4) we find that a.d. = 0.7979c-. (9-9) 9-4 MEANING OF THE MEASURES OF SPREAD The probable error has been defined only for the normal-distribution function. It is the error which divides the area under the distribution function into two equal parts so that there is a 50% chance that any reading will have an error exceeding the probable error. It would be desirable to have this same kind of probability information on the other measures of spread. Actually it is only for a continuously defined, sym- metric function such as the normal function that one can obtain such fixed, easily described information. Before going on to obtain this informa- tion for the normal-distribution function, we shall use the examples of Chapter 6 to illustrate the situation with such distributions as the binomial and the Poisson functions. From Fig. 6-1, which describes the probability of getting various numbers of heads, say from to 10, in a toss of 10 coins, we see that the most probable number and the mean are identical at 5, which is a possible result of a toss. The probability of getting 5 ± 1, that is the probability of getting a 4 or a 5 or a 6 is 0.6563. With this distribution, which is defined only at discrete points, it is not possible to have a result p such that the probability of getting some result within the range n ± p is %. 78 MEASURES OF SPREAD [9-4 In like manner, a = \/T0/2 is nonintegral so that /x ± a is not a possible result. Since 1 < a < 2, the value 0.6563 is the probability of getting a result within the range n ± or, but this particular value of the probability will change if n is changed. With the symmetric distribution discussed above, a has at least some real usefulness; the probability of getting a result between ix and /x + a is the same as the probability for the range \i — a to /x. With pronouncedly skew distributions, even this is not true. In the case of Fig. 6-2, p = 10(|) is not integral and so is not a possible result. Furthermore, the probability of getting a result between jx and /x + 1, say, is 0.29 while that of getting a result between \x — 1 and \x is 0.32. Since a (= Vl0(£)(f)) is 1.18, these probabilities will turn out to have the same numerical value if the upper and lower limits of the range in question are replaced by a. In this particular example, the distribution resembled the Poisson distribution since p was rather small. In Section 8-3 it was shown that the Poisson distribution for large fx approached a normal distribution about the most probable value, the standard deviation, as seen from Eqs. (8-20) and (9-4), being still defined in terms of /x. Such consideration emphasizes the skewness of the binomial distribution of Fig. 6-2. In the Poisson distribution of Fig. 6-3, however, a is again useful even though fx is not very large. Considerable symmetry about the most probable value 6 is evident. For this distribution a = V6T5 = 2.55. The probability of occurrence of an observation within 6 ± 2.55 includes the probabilities for observations 4 through 8, which add up to 0.680. The probabilities for 4 and 5 counts add up to 0.257 and those for 7 and 8 to 0.265, sufficiently near each other to support the appearance of symmetry in Fig. 6-3. If, however, the probabilities farther out on both sides are included, skewness becomes much more noticeable. The sum for 3, 4, and 5 is 0.326 and that for 7, 8, and 9 is 0.351. In line with the discussion of Section 8-3, it is obvious that this numerical symmetry will be lost if one discusses probabilities within the range ix ± <T. The binomial distribution has two parameters, n and p. Since q — 1 — p, q is known when p is known. The Poisson and the normal distri- butions each have only one parameter, /x for the former and h 2 for the latter; a could be used for either of these. Distributions which involve a single parameter are, of course, much easier to use than those which involve more than one parameter. It is fortunate that a large majority of the physical measurements which we do make can be handled ade- quately by a single-parameter distribution. The discussion of the Poisson distribution in Section 8-3 and in this section shows that even here the conditions under which one can approximate it by a normal distribution are not unduly severe. Properties of the normal distribution are copiously tabulated, so that Ave shall restrict our more general discussion of the 9-4] MEANING OF THE MEASURES OF SPREAD 79 determination of the probabilities of obtaining observations within arbitrary limits to the normal distribution. With this distribution, we can determine the probability of an error lying between zero and any specified value X by integrating the distribu- tion function from x = to x = X : rX <?(x < X) = -^- e~ h2x2 dx. JO V7T Tables could be calculated to give values of 6>(x < X) for various values of X and h. This would be a large task, and to be useful the tables would have to fill a great many pages. However, if the integral is written in terms of a new variable y = x/a, then we can eliminate h so that we need only a relatively small number of integrations. Thus x (?(x <X)= -^e~ nx dx= — r=e ' dx o V7r Jo a\/2ir h -h 2 X 2 J _ / 1 „-x 2 /2<t 2 since h 2 = l/2a 2 , and hence -(l/2)(3/ff) © (P(x <X) = ~^= e- {imwa) d[-) = ~i= e~ ylz dy, (9-10) where t = X/a. We will refer to (P(x < X) as Px- For small values of t, we can evaluate Eq. (9-10) by expanding e~ v ' 2 into a series as we did in Section 9-2. For larger values of t, other methods must be used. Table A at the end of this book gives values of Px for values of t — X/a between 0.0 and 5.9. For t = 1.0 Table A shows that P x = 0.34134; i.e., there is a 34.13% chance of an error landing between zero and +o\ Since there is also a 34.13% chance of an error landing between zero and — a, the chance that the absolute value of an error will be less than a is 68.27%, and the chance that the absolute value will exceed a is 31.73%. This probability of 68.27% that an error will fall in the range /j. ± a should be compared with the corresponding 65.6% found for the binomial distribution of Fig. 6-1 and the 68.0% chance that an observation will lie in the range r ± (J for the Poisson distribution of Fig. 6-3. When a given experimental result is being reported, it is common practice to use either ±2a or ±3cr as the limits of reliability. From Table A it is found that P x is 0.47725 for 2a and 0.49865 for 3o\ The chance that the absolute value of the error will exceed the 2a limit is then 4.55% (or less than 1 in 20), and in the case of the 3a limit it is 0.27% (or less than 1 in 370) . Limits wider than 3a are seldom, if ever, used for this purpose. 80 MEASURES OF SPREAD [9-5 i — I — i — i — r n — r \ i — i — r i i i r 0.14 0.12 0.10 0.08 0.06 0.04 0.02 h 3=c "T^nr-T- 14 12 -10 -6 -4 -2 6 8 10 FIGURE 9-1 9-5 AN ILLUSTRATION The histogram of Fig. 7-1 is reproduced here as Fig. 9-1 with some addi- tions. Before describing these additions, we shall caution the reader again. The results of the dart-dropping experiment, given in the histogram of Fig. 7-1 and used to illustrate properties of the normal distribution in Fig. 8-3, are only illustrative. It must be clear to the reader that a repeti- tion of that experiment will not lead to exactly the same values for the mean and the standard deviation, so that we cannot say that the experi- ment has provided any analytically determinable distribution governing the relative frequency of the various observations. The histogram approached a normal distribution sufficiently closely to allow it to be used for these various illustrative purposes. Thus the mean M of the observa- tions is being used here as though it were known to be the expectation value of an analytically determinable normal distribution. In the same way, the value of h 2 (=0.047) which was found from Fig. 8-3 in a descriptive way is used here with Eq. (9-4) to determine a value of a, 3.26, with which we shall continue the illustrations. It is important to realize next that the ordinates of a distribution function were not obtained in this experiment. The numbers that were found were relative values of shaded areas like the one shown in Fig. 8-1. For com- parison, such values of AP, calculated as though they were governed by a normal distribution with jx = —0.782 and a = 3.26, are shown as crosses at the centers of the various intervals in Fig. 9-1. To illustrate the pro- cedure, we shall describe the calculation of the value plotted at -4-0.5. 9-5] AN ILLUSTRATION 81 This is to be the probability of finding an observation in the to 1 range. Since ju is —0.782, the error at is +0.782, and that at 1 is +1.782. The value of the variable t of Eq. (9-10) is 0.240 at and 0.546 at 1.0. From Table A, by linear interpolation between the areas given at 0.55 and 0.56, we find that the probability of there being a result between t = and t = 0.546 is 0.207. The probability of finding a result between t = and t = 0.240 is 0.095. Consequently, the probability of finding a result between 0.240 and 0.546 is the difference, 0.112, which is the value plotted. It was seen in Fig. 8-3 that large residuals occurred with higher proba- bility than should be expected, were the distribution normal. With 500 observations of this sort, we can observe probabilities only in intervals of 5^o, or 0.002, and we can never realize an expectation of say 0.001. If the expected probability is 0.001 and we make 500 observations, we can never get closer than zero or 0.002 to the "correct" probability even if the observations are controlled by a known normal distribution. Let us assume that the value 3.26 used above for a is correct for the normal distribution assumed to be controlling the experiment; this is a reasonable value since it was determined by points in a range of probability where -^q is a relatively small fraction of the observed probabilities. If we now compare this with the <r calculated from the definition of Eq. (9-1), neglecting that part of the definition which says n — > oo , we find that it is smaller than (£x 2 /n) 112 by 0.21. It appears then that in this trial there was a small excess of observations at large deviations. We will refer to this excess again in Chapter 10. CHAPTER 10 METHOD OF LEAST SQUARES The discussion of error or probability distributions has so far been con- fined to analytically determinable distributions. As in so much of science, the function of such discussions is not so much utilitarian as instructional. It is only rarely that the parameters of a particular distribution are known. In fact, they are never known when measurements of basic scientific interest are being made; if they were, there would be no point to making the measurements. Thus, out of the previous discussion we must extract some method of handling the distribution of a finite number of chance errors which will, hopefully, yield results close to reality. It is fortunate for the physical scientist, as opposed to the medical, biological, or social scientist, that the vast majority of his measurements appear to be drawn from distributions which have comparatively small standard deviations. Thus for him a relatively small number of observa- tions will define the distribution, and hence its mean, sufficiently closely for practical purposes. Furthermore, the physical scientist is usually able to proceed satisfactorily on the assumption that the parent distribution governing the distribution of readings in his experiment is normal, or Gaussian. The methods to be described in this chapter are also based on this same assumption. These methods, moreover, will be considered applicable regardless of the number of readings involved ; there is not much else to do unless it is to report each reading. Even if this is done, however, it should be the re- sponsibility of the physical scientist always to make some commonly used, definitely describable, and exactly repeatable estimation of his error dis- tribution. The method of least squares satisfies all these requirements. 10-1 FUNDAMENTAL PRINCIPLE Since an error distribution can never be determined exactly by a finite number of readings, it is never possible to determine the true value of any quantity measured. One has to be content with that value which the 82 10-1] FUNDAMENTAL PRINCIPLE 83 available set of observations indicates to be most probable. As defined by Eq. (8-2) the residuals, which are the only available substitutes for the errors, are the differences between the readings and this most probable value of the quantity in question. To find the most probable value M from a set of observations, say Mi, M 2 , . . . , M n , it is first necessary to know what one means by the "most probable value." The following definition has been accepted by most authorities. The most probable value that can be obtained from a given set of observations is the one which makes that set of observations most probable* Thus it is necessary to calculate the probability of obtaining this set of readings and then find the maximum for the expression. In this process the quantity M must be considered as the variable since the readings Mi, M 2 , M 3 , . . . , M n are now fixed, known quantities. According to the principles set down in Section 6-3, the probability P of obtaining this set of readings is the product of the probabilities for the individual readings. Thus P = APi • AP 2 • • • AP n , where APi = (?(Mi — (x < x < M{ — ai + Ax). If, as assumed, the APi are distributed normally, then APi = -A= e- h2x l Ax. Therefore the probability of getting this set of readings is P = (^)V) w exp(-/* 2 £z?) The method of least squares consists in estimating the expectation value of this distribution by substituting residuals for the errors in the argument of the exponential function Vi **** Xi and adjusting M until P reaches the maximum. This process does not * Here we are using a technique called the method of maximum, likelihood, which is just what the name implies; out of all the possible values of the param- eters necessary for the evaluation of the probability of the existence of a set of observations, we search for those which make the probability a maximum. The method is more pointedly used and discussed than it is here in those texts which are specifically oriented toward an examination of the foundation of statistical theory. See, for instance, Arley and Buch [8] or Lindgren [9]. 84 METHOD OF LEAST SQUARES [10-2 affect h, n, or Ax since, for example, Ax is merely an interval along the abscissa of the distribution, and it is immaterial whether this interval is called Av or Ax. Since the argument of the exponential contains only- squared terms, the negative sign makes the exponent definitely negative. Moreover, the residuals V{ are the only variables, so that the greatest value for P is obtained when the exponent is as close to zero as possible. The exponent cannot be zero unless all of the V{ are zero. Therefore the maximum value for P is obtained when X! v? is as small as possible. This is the principle involved in the method of least squares. Note, however, the more fundamental idea of finding the most probable value of the measured quantity by making the existing readings the most probable readings. This latter notion has a general applicability; the method of least squares applies specifically to the normal error distribution. This proof holds not only for the simple case just treated, in which only one unknown was measured, but in cases where any number of unknowns are to be determined. In the particular example given, the method of least squares merely shows that M is the arithmetic mean of the observations. We have Z>* = (M x - M ) 2 + (M 2 - M ) 2 + • • • + (M n - M ) 2 , and for J^v 2 to be a minimum it is necessary that dZv 2 dM Thus = -2(M 1 - M ) - 2(M 2 - M ) 2(M n - M ) = 0. (10-1) M M l + M 2 + --- +M r In cases where there is more than one unknown, the results are more complex. 10-2 AN EXAMPLE OF LEAST SQUARES WITH MORE THAN ONE UNKNOWN Frequently in science, observations must be made where more than one quantity is unknown. A simple example in which a set of observations is made on more than one unknown is the determination of the acceleration due to gravity g by direct observations of the position of a freely falling body at known instants of time. In the Behr free-fall apparatus,* a metal bob falls along and very close to a long strip of wax paper fastened to a * R. L. Edwards, Am. Phys. Teacher (Now the Am. J. Phys.), 1, 6 (1933). 10-2] least squares: more than one unknown 85 vertical metallic surface. The bob has an annular knife edge. A vertical metallic bar, insulated from the metallic surface, is also mounted parallel and close to the path of the bob. The vertical bar and vertical surface are connected as electrodes to a device which produces strong sparks at the end of specified time intervals. The time intervals are usually very accurate because they are determined by the frequency of the a-c supply. Each spark jumps from the bar through the bob, and then through the wax paper to the metal surface. At the instant it jumps, the spark leaves a fine pin hole in the wax paper at the location of the annular knife edge on the bob. Experiments of this type are performed in most elementary physics laboratories. It is a mistake to analyze the results by assuming that the bob starts from rest at the beginning of the first interval and that the positions of the bob at the end of each subsequent interval are given by $ = 2^ 2 - Actually, being held and released magnetically, the bob usually starts at some unknown time so that it has some unknown velocity at the time of the first spark following release. Thus there are three unknown quantities, g, S , and v , where the latter two are the initial position and initial velocity, respectively. The proper way to observe the position is to place the strip of waxed paper on a flat table and lay a meter stick along this waxed paper in such a manner that one can observe on the meter stick the positions of the pin holes; one should not measure individual distances between pin holes.* We may label these observations Si, S 2 , S 3 , etc. From theory we know that the S's should satisfy the expression S = S + v t + \gt 2 , where S , v , and g are unknown. While in this problem we may be interested only in the value of g, we cannot ignore the fact that S and v are also unknown. The values of t, measured from zero at the first pin hole, are assumed to be known accurately. Actually, chance errors will also occur during the "observations" of t. However, when the timing is done by means of the local a-c power source, which is used for controlling electric clocks, we can consider the time as so accurately determined that these errors are negligible. Fortunately, it is true in a great many experiments in the physical sciences that one of the observed quantities is determined so much more accurately than the others that chance errors in this particular quantity can be ignored by comparison with those in the others. In what follows we shall consider only those cases in which one type of observation is in error. E. M. Pugh, Am. Phys. Teacher (Now the Am. J. Phys.), 4, 70 (1936). 86 METHOD OF LEAST SQUARES [10-2 We can write a set of observation equations, So + Mi + bgti = S lt So + v t 2 + \gt\ = S 2 , So H~ v otn ~f~ 2 Q^n = "tii where the symbol = should be read, "is observed to be equal to." To illustrate the solution of this by the method of least squares,* let us consider a special problem involving seven observations with the Behr free-fall apparatus in which t\ = —ST, t 2 = —2T, t 3 = — T, t± = 0, t 5 = T, t 6 = 2T, and t 7 = ST, where T is the period between sparks. The corresponding observations are $_ 3 , $_ 2 , #_i, S , Si, S 2 , and S s . To simplify the form of the observation equations it is desirable to let \aT 2 = A, VqT = B, and S = C, where A, B, and C are now the unknown quantities. It is very important to note that each of these latter equations contains only one of the original unknowns. The observa- tion equations then become 9A - SB + C A £_ 3 , A + B + C ± S u 4A - 2B + C ^ £_ 2 , 4A + 2B + C ^ £ 2 , A - 5 + C ^ S_i, 9A + 35 + C ^ S 3 , C = a§o- Usually, values of A, B, and C cannot be found which will make these genuine equalities because the two sides of the equations differ by small residual errors. The residuals are v _ 3 = £_ 3 - (9A - SB + C), v _ 2 = £_ 2 - (4A - 2B + C), y 3 = £ 3 - (9A + 35 + C). Thus I> 2 = {£_ 3 - (9A - SB + C)} 2 + • • • + {£ 3 - (9A + 35 + C)} 2 . To obtain the most probable value of the unknown quantities A, B, and C, we must adjust each one separately. If we differentiate £> 2 with respect to A and set the resulting expression equal to zero, we will ensure * Various trick definitions and substitutions are used in what follows. It is well to acquire facility of this sort since it makes for convenience and brevity during algebraic manipulations. 10-2] least squares: more than one unknown 87 a most probable value for A. However, we must also perform the same differentiation with respect to both B and C to obtain their most probable values. Differentiating with respect to each of these unknowns separately and equating the results to zero, we obtain three different equations, called "normal equations": = ^- = -2 X 9{S_ 3 - (9A - 35 + C)} 2 X 4{£_ 2 - (4A - 2B + C)} - 3{£_ 3 - (9 A - 35 + C)} +2 X 2{£_ 2 - (4A - 25 + C)} -f ■ = ^- = 2 X 3{£_ 3 - (9A - 35 + C)} - ^- = -2{£_ 3 - (9A - 35 + C)} -2{£_ 2 - (4A - 25 + C)} - • • • With these we can easily solve for the unknown quantities. We should interrupt our discussion at this point to emphasize one aspect of what has been done. In this problem, we began with seven observation equations but only three unknowns. But by the application of the method of least squares this overdetermined problem, with more equations than unknowns, has been converted to one with the same number of equations as there are unknowns. When we divide each of the three normal equations by —2 and collect terms, we obtain the following result: 196 A + 28C = 9(£ 3 + <SU) + 4(S 2 + S_ 2 ) + (£ x + S_ x ), 28B = 3(£ 3 - S_ 3 ) + 2(S 2 - S_ 2 ) + (S, - 5_x), 28A + 7C = (S 3 + £_ 3 ) + (iS 2 + S- 2 ) + (Si + £-i) + S . Since A is the only unknown that is desired, we solve the first and third of these normal equations by multiplying the third equation by 4 and subtracting it from the first, obtaining A -- Since A = \gT 2 , 9 = 42 y 2 If we wish, we can also calculate B and C from the three normal equations. Note that this problem was made much simpler by using symmetrical coefficients and subscripts because several quantities which would ordinarily have appeared in the normal equations were made to equal zero by this 5(S 3 + <S_ 8 ) - - 3(5i + S_i) - - 4S 84 5(S 3 + S_ 3 ) - - 3(Si + S_!) - - 4S 88 METHOD OF LEAST SQUARES [10-3 device. Many problems we encounter in using the method of least squares can be simplified by similar devices; a good example is the fitting of data by polynomials, which is discussed by Birge [10]. 10-3 GENERAL SOLUTION FOR LINEAR UNKNOWNS The example in Section 10-2 is typical of many problems in science; it contains several unknowns which enter linearly into the problem. Since there are numerous physical problems with this characteristic, and since even if the unknowns do not enter into the observation equations linearly, methods can be employed which will make them linear, it is worth while to solve the linear problem in general terms. Before proceeding to the solution, however, we must discover at which point the linearity enters the problem. The problem at hand is the determination of values of the "constants " in the observation equations. In the example of the previous section, the experimental variable t entered nonlinearly, but this did not affect the linearity of the least-squares problem since all the t's and t 2 's were known from direct observations. The unknowns in the problem were g, v , and S , which entered the equations linearly. Consider then the general problem in which q unknowns are to be obtained from n observation equations, n being always larger than q. Let A, B, . . . ,Q be the unknown quantities (coefficients of the experi- mentally determined or assigned independent variables in the observation equations) whose most probable values are to be determined. Let the observation equations be Aa x + Bb x + • • • + Qqi = M 1} Aa 2 + Bb 2 + --- + Qq 2 ± M 2) (10-2) Aa n + Bb n + 1- Qq n ^ M n , where the M's are the observations and the a's, b's, etc., the experimentally known variables. Note, for instance, that b x could be a\, etc. up to qi = a\, and the least-squares problem would still be linear. As before, we must calculate the sum of the squares of the residuals and differentiate this quantity with respect to each of the unknowns to obtain q normal equations. From these q normal equations we can deter- mine the q unknowns. The sum of the squares of the residuals is 2> 2 = {M 1 - (Aa x + Bb x + h Qq x )} 2 + {M 2 - (Aa 2 + Bb 2 + • • • + Qq 2 )} 2 + ■■■ (10-3) + {M n - (Aa n + Bb n ^ h Qq n )} 2 , 10-3] GENERAL SOLUTION FOR LINEAR UNKNOWNS 89 from which, for instance, -^ = -2a 1 {M 1 - {Aa, + Bb x + • • • + Qqi)} -2a 2 {M 2 - (Aa 2 + Bb 2 + (- Qg 2 )} (10-4) - • • • - 2a n {M n - (^a w + £6 n -\ h Qg n )} = 0. When Eq. (10-4) is divided through by —2 and the terms are collected, the result is [aa]A + [ab]B -} f- [aq]Q = [aM], (10-5) where by definition n n [aa] = ^2 a?, [ab] = ^ a fit, etc. i=l i=\ Equation (10-5) is called the normal equation for A since it is obtained by differentiation with respect to A. Similarly, we can obtain normal equations for B and the other unknowns. We then have q equations in q unknowns, [aa]A + [db]B -\ \- [aq]Q = [aM], [ab]A + [bb]B H h [bq]Q = [bM], (10-6) [aq]A + [bq]B + • • • + [qq]Q = [qM], which can be solved by any of the common methods for simultaneous equations. This general solution for linear observation equations is a relatively easy preparation for solution by the method of least squares when the equations can be put into the form of the general equations (10-2). We only need to carry out the summations indicated by the brackets and form the normal equations. For anyone faced with the necessity of doing this work entirely by hand, we have a convenient arrangement for calculating the sums needed in the general solution; it is shown for the case of two unknowns A and B in Table 10-1, where the observation equations are written on the left-hand side and the sums calculated by filling the columns to the right and adding. Fortunately, many of the summations are repeated in the normal equations so that much labor is saved by computing each sum only once. When a desk calculator is used, the sums should be accumulated as the products are formed; the individual products are not needed. Thought should be given to obtaining maximum efficiency. For example, if the a; are all unity in Eqs. (10-2), then [ab] = £6 and [66] = L^ 2 - 90 METHOD OF LEAST SQUARES [10-3 Table 10-1 aa ab bb aM bM Aai + Bbi = Mi a\ aibi 6? aiMi hMi Aa 2 + Bb 2 = M 2 a\ a 2 b 2 b\ a 2 M 2 b 2 M 2 Attn + Bb n zL M n al a n o n b\ a n M n b n M n Sums [aa] [ab] [bb] [aM] [bM] Most desk calculators can be set to accumulate these two sums simul- taneously. Similarly, in the present case, [aM] = IM and [bM] = £6M. If one is careful in entering the values of M consistently in the proper way, one can also obtain these sums simultaneously. One may think that an automatic computer is ideal for this job — the whole business can be programmed so that the machine will put out the least-squares values of the unknowns, the errors in them, and any other information that is desired and which can be obtained from the available data. Indeed, this is all true, but remarkably enough, there is a precaution that must be observed regarding the number of digits carried in computa- tions. Warning was given in Chapter 2 that while solving simultaneous equations, one must assume that the coefficients of the unknowns are exact. But differences will develop in any solution which can be known sufficiently accurately only if large numbers of digits, in line with the assumption that the starting numbers are exact, are carried along. It has been roughly true that the larger and faster the automatic computer, the smaller the number of digits it can store at a single memory location, a failing that is being corrected in the newer computers. The authors once saw least-squares calculations of exactly the sort being discussed here, i.e., with three unknowns, go badly awry on an IBM 704 for just this reason! Correct answers were obtained when a single number with twice as many digits was stored in two memory locations. After mentioning this example, let us emphasize that the user of a desk calculator must watch this same problem. Every possible digit should be carried, and the results must be examined with a critical eye. This admonition often applies to the generation of the [aM], [aa], etc., as well. The safest rule here, contrary to Section 3-1, is to carry at least as many digits in every operation as fit on a 10-bank calculator. The least check one should make is to see if the calculated curve goes through the data as described in Section 4-3. 10-4] LEAST-SQUARES FITTING OF STRAIGHT LINE 91 Whether with the aid of a desk calculator or an automatic computer, we can use any of a number of methods to solve the normal equations. Frequently, as in the particular problem with the Behr free-fall apparatus just solved, the method of solution is obvious. We can, however, give a general solution by using the determinantal notation. For example, the second unknown B can be given by [aa] [aM] [ac] . . . [aq] [ab] [bM] [be] [bq] B = aq] [qM] [cq] . . . [qq] [aa] [ab] [ac] . . . [aq] [ab] [bb] [be] [bq] [aq] [bq] [cq] . . . [qq] A similar expression can be written for each of the other unknowns. The use of determinants can be very tedious when the number of un- knowns is greater than three; in such an event the reader might well find other methods more attractive. One such method* is described in Appendix 9; if it is to be used exactly as described, the designations of the unknowns should be such that \A\ < 151 < < \Q\, as well as one can judge, with the order of the equations maintained as shown in Eqs. (10-6), where the first one is the normal equation for A, the second is the normal equation for B, and so on. This is accomplished most conveniently by starting with the unknowns arranged in the above order in the observation equations (10-2). 10-4 LEAST-SQUARES FITTING OF A STRAIGHT LINE It was pointed out in Section 4-3 that one should learn as much as possible about the fitting of a straight line to (appropriate) experimental data. We shall describe in this section the computational methods to be applied when there is no a priori reason to suppose that the data are not all equally worthy of consideration. Various notations are used to describe the general straight line, par- ticularly in the literature on the subject under consideration here. The * Another useful method is Jordan elimination. See, for instance, E. Bodewig, Matrix Calculus (New York: Interscience, 1959, p. 107). 92 METHOD OF LEAST SQUARES [10-4 most common of these are y = a + bx, as used in Section 4-3, and y = mx + b. There is much confusion between y = a + bx and the equally common y = ax + b. In all these equations x is the independent variable, whose values are assigned during the experiment, and y is the observed dependent variable. A specific example would be the motion of a body moving along a scale with constant velocity v. If at time t — the body is at s on the scale, its subsequent positions s will be given by s = s + vt. Generally one can read a "clock" more accurately than a scale position. Hence the latter is treated as the quantity to be observed at assigned values of t. Notations tend to become somewhat confusing in this work because the experimental variables become the least-squares constants, and vice versa. In the above example, s and t are observed and therefore are treated as constants, while s and v must be treated as variables since their most probable values are being determined. When the computations for this section are discussed, y = A + Bx will be the form used. Here, numerical values of y and x are known as the result of the performance of an experiment; A and B are the quantities whose numerical values are to be determined by, hence are the variables for, the least-squares maximizing process. In the general straight-line equation, the coefficient of x is called the slope, and the constant term the intercept. A plot of s vs. t for the specific example would then have v for its slope and s for its intercept. We next note that the number of quantities to be determined by the method of least squares, two, is the same as in the example of Section 10-3 to which Table 10-1 applies. Indeed, we can cast the observation equation Aa x + Bb x J= M x into the form of this section by dividing by a x and letting bi/a x play the role of Xi and M x /a\ the role of y x . On the other hand, the reader should be cautioned to be sure he knows what he is doing before making such a transformation. For instance, it is implied by the original observation 10-4] LEAST-SQUARES FITTING OF STRAIGHT LINE 93 equation that Mi is the quantity actually observed in the experiment, not M 1 /a 1 . This point is important, and later we shall look into the reason why when we discuss observations of unequal weight, where the division by a x changes the weight of M 1 . Note that in the specific example of the moving body, it is actually s which is observed at given values of t. Since the case where the observation equations are of the form y = A Bx is one of the most common, it is desirable to restate it. Suppose that when x has the value Xi, y is observed to have the value y { . If the quantity A + Bxi is calculated by using some pair of values for A and B, then the difference between this and the observed value is A + Bxi — y t . We can apply the method of least squares by squaring this number and those obtained for all other values of i, summing these squares, and adjusting A and B until this sum is a minimum. This last operation is done with the aid of calculus, as before. Thus 2>? = E (^ + Bx * - y<) 2 , dA SB = = 2^2 (A + B Xi - yi ), i = 0= 2^,XiU+Bxi- Vi ). (10-7) (10-8) Since Y,iA means the addition of as many values of A as there are values of i, and if we let i = 1, • • • , n, then Eq. (10-7) becomes nA + B ^2 x { = ^T, Vi> i i and Eq. (10-8) becomes A ^2 x { + B J] x 2 i = J] XiPi. i i i These are to be solved for A and B. The result is A = A ~' [(S »«)(E *?) - (S *<)(Z **«)] - do-9) B = A" 1 |n (^2 x<y,j - hr z.jfe «') ' (10-10) 94 METHOD OF LEAST SQUARES [10-4 where An example is shown in Table 10-2. In order to be sure that he has a complete understanding of the procedures, the reader should verify all the sums and solutions shown in the table for the values of x { and yi given. If a desk calculator is available, it can be used in the manner suggested in the previous section. Table 10-2 0.1 0.7 2> = 2.1 2>? = 2.19 J2yi = 135.5 X>;*/i = 143.87 A = 2.16 1.3 87.6 A = —2.49 B = 68.08 A fairly common special example of the two-parameter equation is the case where the values of x are evenly spaced. For instance, in the Kundt's tube experiment* the wavelength of sound in the tube is determined by measuring the distance between the equally spaced nodes in the cork dust. The only sensible way to make this measurement is to place a meter stick along the tube and note the position of the nodes on this meter stick, as was described for the Behr free-fall experiment because in this way, if an error in observation is made at one point which tends to increase the distance between one set of nodes, this same error will tend to decrease the distance between an adjacent set of nodes, thus somewhat com- pensating for the error. Naively, one might expect that the best way to obtain the spacing from the suggested set of measurements would be to subtract adjacent readings to obtain the approximate spacing and then to average the results thus obtained. However, if one does this algebraically, he discovers that the average obtained is , _ (Vr — 2/r-l) + Q/r-1 ~ Vr-Z) H + JV-r + l - V -r) n The final result obtained by this procedure retains only the two end observations and does not take advantage of any of the accuracy that should be available from the other observations that have been made. The proper procedure is to consider the observations as representing a * See, for instance, F. W. Sears and M. W. Zemansky, University Physics, Reading, Mass.: Addison- Wesley, 1964, p. 503. 10-5] OBSERVATIONS OF UNEQUAL WEIGHT 95 straight-line plot of y vs. x, where y is the reading on the meter stick and x is the ordinal number of the node being measured. Then the slope represents the spacing desired. We shall consider two cases. If there are an odd number of readings, let y for the central one be called y , let the spacing between values of x be called d (=1 for the Kundt's tube example), and let there be r readings on each side of the origin. Then 2r + 1 = n, the number of readings, and Vi = Vo + Bid, —r < i < r. With the aid of some hints given in Appendix 3 we can show from Eq. (10-10) that r = r (Vr — V-r) H + 3(y 3 — Vs) + 2{y 2 — y-2) + lfai — V-i) 2d{r 2 + (r - l) 2 H h 3 2 + 2 2 + l 2 } = ELi i(yi — y-i) 2d[Y, r i=i i 2 ] Note that the central reading does not appear. In the other case, if there are an even number of readings, let the origin be half-way between the two central readings. Then . p (2i - l)d n ^ . . Vi = Vo + B g ' < i < r, , „ (2i + l)d . . Vi = y + B ± — ^-^- , -r < i < 0, and n = 2r. Thus we have r = (2r - l)(y r - y_ r ) + • • - + 3(y 2 - y. 2 ) + O/i - y_Q d[(2r - l) 2 H h 3 2 + l 2 ] and all the readings appear. 10-5 OBSERVATIONS OF UNEQUAL WEIGHT Thus far in the treatment of least squares we have assumed that all the observations have the same precision measure. But it frequently happens that some observations are made with considerably greater precision than others. A good illustration is found in an early version of the Behr free- fall experiment,* where the position of the falling body is recorded by means of a vibrating tuning fork fixed to the falling body which makes a wavy trace on the waxed paper. The peaks of the waves, instead of the R. L. Edwards, loc. cit. 96 METHOD OF LEAST SQUARES [10-5 pin holes, determine the positions. When the body is moving slowly, the waves are close together, and the peaks are very sharp so that positions can be determined with considerable accuracy. When the body is moving at higher speed, however, the peaks spread out into long waves, and it is much more difficult to determine the exact location of the peaks. In this case, the uncertainty in the observations made with the body traveling at high speed is large compared with the uncertainty for the observations with the body moving at low speed. Examples of weighting, including this one, will be discussed later. Where the observations are of unequal precision, we cannot consider them as coming from the same infinite parent distribution of errors. The errors of each of the observations must be characterized by different values of the measure of precision h in the equation of the normal-distri- bution function. Assume that a set of n readings have errors X{, each drawn from a distribution with a different precision index hi, but all with the same expectation value. Then, as in Section 10-1, the probability of obtaining the set is n-T=l(^) n exp["-z^ P = -i V*-- l2J. As before, the common expectation value is estimated by replacing the X{ by Vi and adjusting the latter until P reaches its maximum. This value for P is obtained when ^T,(h 2 v 2 ) is as small as possible; the most probable values of the unknowns are obtained when J^(h 2 v 2 ) is a minimum. This is the most general statement of the method of least squares. Of course, since we have not yet discussed any procedures for estimating the precision measures or, equivalently, the standard deviations for the various distributions, the equation is not yet good for numerical work. We shall discuss these procedures in Section 10-8. A more useful form of the equation can be found by letting h\ — wih 2 , hi = w 2 h 2 , . . . , hn = w n h 2 , (10-11) where h 2 is some constant; it could be the least of the h 2 with a corre- sponding w of unity. The statement of the method of least squares then becomes h 2 (w x v 2 + w 2 v\ + • • • + w n Vn) = minimum. (10-12) To see what this expression means, let us consider the case of observations on a single unknown in which Vl = Ri - M, v 2 = R 2 - M, . . . , v n = R n - M. 10-5] OBSERVATIONS OF UNEQUAL WEIGHT 97 Here M is the most probable value of the unknown quantity and should be considered as a variable until its best value has been determined. Then Ew 2 = w>i(#i - M) 2 + w 2 (R 2 - M) 2 + • • • + w n (R n - M) 2 . This expression is to be differentiated with respect to M, and the result set equal to zero: = ^^ = - 2w 1 (R 1 -M)- 2w 2 (R 2 -M) 2w n {R n - M). Solving this for M gives wiRi + w 2 R 2 + • • • + w n R n M wi -\--w 2 + • • • + w r , which is the common form used in obtaining weighted averages. This shows that the w's, introduced with Eqs. (10-11) without any other definition, are in fact the weights of the observations, and the quantity h represents the precision measure of the observations of weight unity. Equations (10-11) enable us to obtain a very important relation between the weights and the precision measures, namely, 4 = - • (10-13) which shows that the squares of the precision measures are directly proportional to the weights of the corresponding observations. Con- versely, the squares of the measures of spread are inversely proportional to the weights of the observations. Note that the repeated appearance of a particular observation increases its weight accordingly. This can be seen by referring to Eqs. (10-11) and (10-12). If, from the universe characterized by h, the residual v x is drawn Wi times, the residual v 2 drawn w 2 times, and so on, it is clear that the complete sum of squares of all the residuals will be exactly the sum in Eq. (10-12). It is now possible to set down some very important relationships between precision measures, weights, and measures of spread. By com- bining Eqs. (9-4), (9-6), (9-8), and (10-13), we obtain the following equations : *> = J«I = *» = £» = »4». ( 10 -i4) h 2 \w 2 <j\ pi a.d.i If h and a are respectively the precision measure and the standard devia- tion for the distribution of individual observations, and if h and <r are 98 METHOD OF LEAST SQUARES [10-6 the precision measure and standard deviation for the distribution of averages of random samples of n of these readings, then from Eqs. (10-14) we find that 1.2 2 h __ n _ a or al = — • (10-15) This result will be more rigorously demonstrated in Chapter 12. Let us now return to the more general case, which is also governed by Eq. (10-12) ; that is, a set of observation equations like those in Section 10-3 may frequently result from observations which are by no means of equal precision, or it may be possible and convenient to convert the actual observations to a linear form where the M's are functions of the actual observations and so have weights different from the latter. In such cases the equations should be treated by making J^wv 2 a minimum as indicated by Eq. (10-12). Otherwise the procedure is exactly as in Section 10-3. The resulting normal equations are [waa]A + [wab]B + • • • + [waq]Q = [waM], [wab]A + [wbb]B H h [wbq]Q = [wbM], [waq] A + [wqb]B + • • • + [wqq]Q = [wqM], where, for instance, (10-16) = E \waa\ = y . Wial. The result again is a set of simultaneous equations in q unknowns, and the solution is just as before. Examples will be given in the following sections. 10-6 CONDITION EQUATIONS It sometimes happens that a set of observations of several unknown quantities must satisfy exactly one or more theoretical conditions existing between the unknowns. For example, when we make measurements of three angles A, B, C of a plane triangle, the observations must satisfy the theoretical condition A -\- B + C = 180°. With this condition equation we can reduce the number of unknown quantities from three to two. There are then three observation equations in two unknowns, and we can apply the method of least squares. When there is more than one 10-6] CONDITION EQUATIONS 99 condition equation between the unknowns, each condition equation can be used to eliminate one of the unknown quantities. Thus we can reduce the number of unknowns by the number of condition equations. Let us consider a specific problem. Assume that observations on three angles of a triangle have been made with the following results: Angles A =2= 42.31 c B =°= 75.89 c C =L 62.13 c Weight 2 1 1 We can eliminate C from the third observation equation by means of the condition equation and obtain -A B 117.87 c The resulting three observation equations in A and B can be solved by the methods already given, but the amount of arithmetic can be con- siderably reduced by a device which is useful in many problems with the method of least squares. First, we find approximate values for the un- known quantities. Then we consider a set of new unknowns as the dif- ference between these approximate values and the most probable values of the unknowns. In this problem, we let A = 42.31° + z u B = 75.89° + z 2 . Substituting these values in the observation equations and using the method of calculating summations given in Table 10-1, we obtain Table 10-3. The resulting normal equations are 3 Zl + Z2 = -0.33, Zl + 2z 2 = -0.33. Table 10-3 zi ~ 22 = -z 2 = 0.33° Weights waa wab wbb waM wbM —zi 2 1 1 2 1 1 1 1 —0.33° —0.33° Summations 3 1 2 —0.33° —0.33° 100 METHOD OF LEAST SQUARES Using determinants, we find that 21 = [10-7 -0.33 1 -0.33 2 3 1 1 2 0.066 c Similarly, z 2 = —0.132°. The most probable values are then A = 42.31° — 0.066° = 42.244°, B = 75.89° - 0.132° = 75.758°. From the condition equation we find that C = 61.998°. The same result would have been obtained if the condition equation had been used to eliminate either A or B instead of C. Had the unknown being eliminated appeared in more than one observation equation, we would have been able to eliminate it from each by the use of the condition equation. The example used above is, of course, one of common occurrence, but it fails to point out a pitfall in the use of condition equations when the observations have unequal weights. The general nature of this danger is common to all work with observations of unequal weight. When using condition equations, if one observed values of, say, A, B, and C, each having different known weights, but the condition equation were of the form aA + bB + cC = Q, rather than A + B + C = Q, he must remember that it is the weights of A, B, and C which are known, rather than the weights of aA, bB, or cC. Thus, if the value of B is to be replaced by use of the condition equation, the latter must be used in the form A + B + c n _Q b L ~ b 10-7 NONLINEAR OBSERVATION EQUATIONS So far, in the applications of the method of least squares, we have treated only observation equations which are linear in the unknown quantities. It frequently happens, however, that one desires to use the method of least squares on observation equations in which the unknowns are non- linear. It is simplest to treat the case of two unknowns, from whose solution the extension to any number of unknowns is obvious. Assume that the 10-7] NONLINEAR OBSERVATION EQUATIONS 101 unknown quantities are Z x and Z 2 , and the observation equations are: Equations Weight f l (Z l ,Z 2 )^M 1 Wl f 2 (Z 1 ,Z 2 )^M 2 w 2 : : (10-17) f n {Z u Z 2 ) J= M n w n where the functions fi,f 2 , . . . ,f n are nonlinear in Z x and Z 2 . It is almost never desirable to solve a 'problem of this nature by the method of least squares until one has obtained approximate values for the unknowns by other methods which usually require less calculation. Whenever possible, even with linear functions, the graphical methods of analysis treated in Chapter 4 should be used first because those methods are simpler and because they furnish an excellent method of testing the consistency of the observations. Frequently, glaring mistakes may have been made in the observations; they will show up in a graphical solution but will not be noticed in a solution by the method of least squares. Obviously, mistakes must be eliminated before reliable results can be obtained by the method of least squares. If no better method is available for obtaining approximate values of Z x and Z 2 in Eqs. (10-17), we can always solve a pair of observa- tion equations. Solution of nonlinear equations starts with approximate values for the unknowns. Let us assume that the approximate values A and B have been obtained for Z x and Z 2 respectively. Following the method of the previous section, we set Z x = A + z 1} Z 2 = B + z 2 , (10-18) where z x and z 2 are considered as the new unknown quantities which are to be determined. By Taylor's theorem, we can approximate the left- hand members of the observation equations as follows: /i(Z lf Z 2 ) =/!(4,B)+^ f 2 (Z u Z 2 ) =f 2 {A,B)+^ A,B uZ<2 dZ. A,B OL 2 Z2, A,B A,B (10-19) z 2 , etc., where the subscript A, B of the partial derivatives means that these partial derivatives are to be evaluated numerically at Z x = A and Z 2 = B. The quantities f x (A,B), f 2 (A,B), etc., are numerical values which should almost equal the observations M X ,M 2 , etc., respectively, if A and B have been carefully chosen and if the observations are consistent. 102 METHOD OF LEAST SQUARES [10-7 Let M x -h(A,B) = m lt M 2 - f 2 (A,B) = m 2 , etc., where the m's are small numerical quantities which have the same residuals as the M's if the approximations given in Eqs. (10-19) are sufficiently accurate for the purpose. We can then consider the m's as the new obser- vations and obtain the observation equations: Equations dfl dZi A,B dZ 2 Z 2 = Wi A,B df 2 dZ x 9 + df2 A,B 0L2 z 2 = m 2 A,B dfn dZi , df n A,B 0L2 z 2 = m n A,B Weight W\ w 2 Wr, (10-20) which are linear in the unknown quantities z x and z 2 . By this device it is always possible to make a linear system of equations from any system of nonlinear observation equations. As an example, suppose an experiment has been performed in which a vessel containing an unknown liquid is placed in an air space which is surrounded with cracked ice, and the temperature of the liquid is observed to be 6 1, 8 2 , #3, etc., at times ti, t 2 , h, etc. Our purpose may be to deter- mine the specific heat of the unknown liquid, having previously deter- mined the heat capacity of the containing vessel. According to Newton's Law of Cooling, Temperature Weight 6 e- bt i =2, Si Wi e Q e~ bt 2 ^ e 2 w 2 d e- bt 3 =°= 3 w 3 e e -bt. W, are the observation equations, where in general the time ti can be measured much more accurately than the temperature. The solution of this problem follows. We obtain the approximate values for the unknowns 6 and b by the graphical methods of Chapter 4. Suppose that these values are 8 = A and b = B. Then we define the most probable values of these constants as 6 = A + z lf b = B + z 2 , 10-7] NONLINEAR OBSERVATION EQUATIONS 103 where Z\ and z 2 are the new unknown quantities to be determined. Using Taylor's theorem, we find that 0i ^ 4r B 'i + e~ Bt ^z x — At x e~ Bt ^z 2 , 2 = Ae~ Bt 2 + e~ Bt *z x — At 2 e~ Bt zz 2 , d n ^ Ae~ Bt * + e~ Bt »z x — At n e~ Bt »z 2 . Let 0! — Ae~ Bt i = m x , 2 — Ae~ Bt 2 = m 2 , etc. The observation equations then become Equations Weight e Bt ^z x — At x e Bt iz 2 = m x , w x e~ B % x — At 2 e~ Bt 2z 2 ± m 2 , etc. w 2 , etc. (10-21) This system of observation equations is now linear in the unknowns and may be solved by the general methods described previously. We note that, in the above example as well as in the more general case, the quantity X^-mf is a zeroth-order approximation to the sum of the weighted squares of the residuals since the ra* are obtained by inserting values for the unknowns in the observation equations. Because of problems similar in nature to those raised in the discussion of Fig. 3-1 — large second or third derivatives of the theoretical function or, especially, the presence of maxima or minima for values of the unknowns near the trial values — it can turn out that ^.Wivf after solution for the Zi is greater than ^2wimf . It is beyond the scope of this book to discuss all the contingencies in this event, but usually it means that a poor choice of trial values of the unknowns was made, and the problem should be solved with another choice. Conversely, even if J^w l j vf < Yjvi^i, it does not follow that J^WiV? is a minimum. The entire procedure should be repeated with the initial trial values converted to new ones by the addition of the zi derivatives calcu- lated for the new trial values, etc. In the example of the cooling curve, a somewhat simpler solution is possible, provided one is careful to determine new weights for the obser- vation equations. The logarithms of the observed temperatures may be considered as the observations, and the new observation equations written as ln0 o — bt x = ln0!, In O — bt 2 =L In 2 , etc., (10-22) 104 METHOD OF LEAST SQUARES [10-8 where the unknowns are In O an d b. The actual observations are, of course, the 0's and not the logarithms of these quantities. The method employed must ensure that ^wv 2 equal a minimum, where v 1 is the residual for 6 1, v 2 the residual for 2 , etc. If the observation equations (10-21) are treated in the usual fashion, the result will be equivalent to making J^wv 2 a minimum. However, treating the new observation equations (10-22) in this fashion is not equivalent to minimizing ^wv 2 , but rather to minimizing J^wV 2 , where the V's are the residuals in the logarithms of the 0's. Thus V\ is the residual in In 6 1} V 2 the residual in In 2 , etc. Now, V\ is equal to the product of the small change in In X for a small change in X and the residual for d\) i.e., d (In di) v x Vl== -deT Vl = T 1 t AT d (In 2 ) v 2 . V * = \a V2 = T ' etc - Then^wF 2 = H(w/0 2 )i> 2 , which amounts to giving the original equations weights equal to Wi/df (i = 1, . . . , n). Obviously these weights are improper for the actual observations, but this discrepancy can be cor- rected by supplying the weights Wid 2 for the new observation equations. The final result is then obtained by treating the observation equations in In di as linear equations in the unknowns but with weights Wid 2 . From the above example it should be obvious that any manipulation of the original observation equations will usually result in improper weighting. For example, if we clear a set of observation equations of fractions by multiplying by the denominators 2, 3, etc., we will find that this process is equivalent to weighting the observation equations by the squares of these denominators. Such a weighting can be corrected by using fractional weights, but we will then find that no advantage has been gained. The safest procedure is to use the original observation equations exactly as they occur, making sure that the quantity on the right-hand side of the = sign is the direct observation obtained from the experiment. 10-8 COMPUTATION OF THE MEASURES OF SPREAD It was mentioned that the true errors xi are not equal to the residuals vi unless the number of observations n approaches infinity. In statistical discussions it is common to distinguish three standard deviations: a, the standard deviation of the infinite parent distribution just defined; a', the best estimate of a that can be obtained from the given sample of readings; and S, the sample standard deviation defined by S 2 = (Y,v 2 )/n. Since a cannot be found, we will assume here that our estimate of it, which 10-8] COMPUTATION OF THE MEASURES OF SPREAD 105 we shall describe below, is sufficiently close so that the difference can be ignored. If the expectation value li of the distribution were known, then the error x could be found for each reading, and the best estimate of the standard deviation for the distribution which we can obtain from n observations would be a = [J2x 2 ]/n. In Fig. 10-1 suppose that the solid curve centered on fx represents the parent distribution. Suppose, moreover, that the average, i.e., the most probable, value of /x which is calculable from the n readings is M . Then the dashed curve of Fig. 10-1, centered on M, represents the best estimate of the solid curve which can be obtained from the n readings — with respect both to the location of its center and to its width. It corresponds to the crosses plotted in Fig. 9-1. Having found the best estimate of the center of symmetry of the dis- tribution, we must estimate the width of the solid curve, to be plotted as the width of the dashed curve, which includes 68.27% of the unit area under either curve. Let B be the difference M — fx. Then at the reading P t , Xi = Vi + B, x 2 i = v 2 + 2 Vi B + B 2 , !>? = 2>? + «* 2 (10-23) since M was found by setting n Z v * = °> FIGURE 10-1 which is equivalent to Eq. (10-1). Equation (10-23) is to be expected since the method of least squares has made certain that £> 2 < I> 2 , where, of course, vi and X{ are for the same readings. The value of B is almost completely unknown; however, it must be of the same order of magnitude as one of the measures of spread for a distribution of averages of n readings. All of the measures of spread have the same form and are related to each other by constants that are close to one. It seems reasonable to assume, therefore, that Zx 2 B' Cat = C— = C 106 METHOD OF LEAST SQUARES [10-8 where C has a value near one. Then n 2y(i-9 = z>*. and Eq. (10-23) becomes Zx 2 = E^ 2 n n — C Obviously, the correction being considered is of no importance for large n. It is important, however, when n is rather small and we wish not to be too optimistic about the spread that may be obtained from this small number of readings. The worst possible case is where n = 1. Then it is obvious that no idea of the spread can be obtained, and the only value of C that is reasonable in this region is 1, since for one reading ]T> 2 = and n — C = 0. Thus J^x 2 /n becomes indeterminate, as should be expected. Therefore, in this case of a single variable, we set C equal to one. Thus for a single variable the best estimate of the standard deviation in a single reading is \n — 1 and the best estimate for the standard deviation for the mean is, from Eq. (10-15), o-o = J-r^vi • (10_25) \n(n — 1) The quantity (n — 1) in the above equations is the number of degrees of freedom of the system. By degrees of freedom we mean the number of observations in excess of the minimum theoretically required to obtain the unknown quantity. For example, one observation is theoretically sufficient to determine the length of a table. If ten observations are made, the system of ten observation equations has nine degrees of freedom because there are nine more observations than are theoretically required to obtain the length of the table. From this we see that if n observations are made on q unknowns, the number of degrees of freedom is (n — q), and the value of the standard deviation for observations of unit weight becomes ■1-, zy When the observations are weighted, the standard deviation for the 10-9] TREATMENT OF IDENTICAL READINGS 107 observations of unit weight is = JT&L , (10 _ 26) \n — q and a for an observation of weight w k is given by (TWk ~ Vwk ~ vw k (n - q) [see Eq. (10-14)]. Finally, the standard deviation of the average of a set of observations of one unknown is °"o = -7== = J , 1N ^.„ • (10-28) (n — l)X!w Where observation equations are available for more than one unknown, the computation of the standard deviation of the unknown quantities is more complicated; and we need additional discussion before we can calculate it. This problem will be treated in the next chapter. 10-9 TREATMENT OF IDENTICAL READINGS A special case of considerable importance arises when a series of identical readings are obtained in the measurement of a single quantity. In this case, the average is the same as the individual observations. The distri- bution is clearly not normal; a slavish substitution into the foregoing equations gives (T = 0, which is nonsense. This case usually comes about when too coarse a scale is employed for reading the setting of an instru- ment; the setting can be adjusted more accurately than it can be read. Obviously, if one measures a steel bar to the nearest inch, all observations will be identical, but if one measures the same bar to the nearest 0.0001 in. he will obtain a spread. Suppose that in measuring a steel bar to the nearest inch, ten readings of 77 in. are obtained. It is certain that the true length lies between 76.5 in. and 77.5 in., but all values between these limits are equally probable. This means the average of the ten readings should be considered to come from a rectangular distribution extending from 76.5 in. to 77.5 in. Using the results from Section 9-1 for this case, we find that <r = -^- = 0.29d, a = ^^ = 0.092d, 2V3 V10 where d is the width of the smallest interval observed. In this special case d = 1 in. The standard deviation of the result should be reported as 0.1 in. In no case should a standard deviation smaller than this be reported. 108 METHOD OF LEAST SQUARES [10-10 10-10 THE REJECTION OF OBSERVATIONS In the first chapter a distinction was made between an error and a mistake. It is obvious that if one of the readings in a group is known to be the result of a mistake, one would wish to reject it. The problem is that such mistakes usually are not identifiable, and so one comes to consider whether there are not methods connected with the distribution of the readings by which he can make an educated guess as to whether any of the readings seem not to belong with the majority of them. While the whole idea of discarding readings is rejected by many in- vestigators, it is argued here that the possibility of an undetected mistake is very real. In this context "mistake" means more than a mistake by the operator in reading a scale, for instance. It is not at all uncommon for sensitive electronic equipment to respond in a transitory way to some outside influence, such as the turning on of a large motor in a neighboring laboratory, or for a transistory air current to affect the swing of a balance. Events such as these introduce a reading which is not drawn from the same universe as the remaining readings. The cost of allowing one's results to be distorted by a single bad reading is much higher than is that of mistakingly rejecting one good reading out of ten, say. In Chapter 12 we will consider methods which take into account much more precisely than those we have used so far the effects of the number of observations on such quantities as the probability that all the readings came from the same parent distribution. Nevertheless one can apply only the knowledge gained to this point to get results which are usefully accurate for samples even as small as five when the standard deviations of the parent distributions are as small as those with which the physical scientist is dealing most of the time. Two bases for rejection will be set forth. The first is easier to apply and, some would say, safer. The second is more enlightening and illus- trative; it is pedagogically more useful. The first method is to decide that if the chance of the occurrence of some particular reading is less than some arbitrarily fixed number, it will be assumed to be a mistake. As usual, we take the mean of the readings to be the same as the expectation value for the distribution and assign to a the estimate provided by Eq. (10-24). Then we see whether any residual lies outside the value of X given by Eq. (9-10) when P x is some arbitrarily chosen number. If it is decided that any residual having a chance of occurrence less than 1 in 1000, say, is to be rejected, then we let P x be ^(0.999) since (P(v < oo ) covers only half of the symmetric normal distribution. Reference to Table A shows that the area P x under the distribution reaches 0.4995 when X/a is 3.29. [We recall that this a is not the standard deviation in the mean of the readings in question; it is the standard deviation in a 10-10] THE REJECTION OF OBSERVATIONS 109 single reading — the measure of the spread in the distribution of the individual readings — as determined from Eq. (10-24).] Thus if we have a residual greater than X = 3.29 4 Y*v'' (n- 1) we can reject the reading. Note that this criterion will never allow the rejection of one out of ten or fewer observations when the probability of occurrence of that observa- tion is less than 0.1%. For, if it did, it would mean that there would have to be a residual X such that X 2 > (3.29) : X 2 + S?_H>I 10 - 1 where the coefficient of (3.29) 2 on the right-hand side is the sum of the squares of 10 residuals, including X 2 , divided by n — 1, or 10 — 1. It is clearly impossible to satisfy this inequality. Consider the example given in Table 10-4. We see that the reading 4.6 can be rejected by this criterion, but it should be noted that though it looks badly out of place, there is not a tremendous difference between its residual and 3.29cr. Readings should never be rejected on the basis of "looks." After rejection, the remaining data should again be subjected to averaging and the new <x calculated. Usually it will be found that only one reading can be rejected. In the second method, called Chauvenet's criterion, we make the limiting probability of occurrence for "acceptable" readings depend on the number of readings in the following way. In a set of n readings the number of errors that should be less than some value X in absolute value Table 10-4 n Vn Vn n Vn V n 1 5.4 0.09 11 5.4 0.09 2 5.3 —0.01 12 5.3 -0.01 3 5.5 0.19 13 5.5 0.19 4 4.6 -0.71 14 5.2 -0.11 5 5.2 -0.11 15 5.3 -0.01 6 5.3 —0.01 16 5.3 -0.01 7 5.2 —0.11 17 5.2 —0.11 8 5.5 0.19 18 5.4 0.09 9 5.6 0.29 19 5.3 -0.01 10 5.3 -0.01 20 5.4 0.09 V = 5.31 a = 0.2 024 3.29(7 = 0.666 110 METHOD OF LEAST SQUARES [10-10 is 2nPx, where the factor 2 appears because Px gives half the total area under the error curve for —X < v < X. Then if 2nP x is the number of errors less than X, the number of errors greater than X is (n — 2nP x )- If Px is large enough such that (n — 2nPx) is less than one half of a reading we can suppose that it is more probable than not that an error greater than X in magnitude does not belong in the same distribution as the rest of them. Thus the limit of rejection is given by or Px \ = n(l - 2P X ), 2n - 1 An (10-29) As an example, consider the first ten values in Table 10-4. This set has y = 5.29 and a = 0.277. With n = 10, Eq. (10-29) gives P x = 0.475. According to Table A, Px reaches this value at X/a = 1.96, which, for this set of data, means at X = 0.54. Since the error at the fourth point is 0.69, we can reject the observation by this criterion. Recalculation with nine points yields z = 5.37, a = 0.141, P x = 0.472, X/a = 1.91, X = 0.27. The largest error is 0.23 so that no further points can be rejected. We can now improve the estimates of the parameters of the distribution generated by the dart-dropping experiment described in Chapter 7 and discussed in Chapters 8 and 9. This is done by using Eq. (10-26), with q = 1 and w as the number of observations in the intervals, and applying the rejection criterion just described. We then obtain the result given in Table 10-5. Since the number of observations here is so large, the value of Px remains constant to four significant figures at 0.4995 throughout. The values in the last row of Table 10-5 will be used in examples in Chapter 12. Table 10-5 Zn M a X Reject 500 499 498 —0.7820 -0.7485 —0.7229 3.475 3.397 3.352 11.36 11.11 10.96 -17.5 —13.5 none It is obvious that these changes will also alter the shape of the normal distribution, which was fitted to the histogram for this experiment and shown in Fig. 9-1. The appearance of the curve remains much the same, however, and recomputation will not affect the conclusions drawn from the earlier figure. 10-11] the range: a convenient estimator 111 As with the previous criterion, one must never reject more than one point at a time. It can happen that more than one observation exceeds a rejection limit, but the data must be recalculated after rejection of only the largest one. The omission of a single reading which is presumed to be erroneous might alter considerably one's picture of the error distribution. 10-11 THE RANGE: A CONVENIENT ESTIMATOR When working in the laboratory it is frequently desirable to know the approximate precision of a given set of observations. One would like to know quickly whether the desired degree of precision has been reached with the number of readings that have been taken, or whether more should be taken, without spending the time to make the elaborate calculations indicated by the foregoing treatment. One way of estimating this is to observe the difference between the largest and smallest readings. This difference, called the range, has a probability distribution, just as the individual readings do. The distribution of the range depends, of course, on the nature of the distribution of the readings, but it is different from the latter. In particular, the distribution of the range is not a normal distribu- tion for readings which are distributed normally. The distribution of the range for readings which are distributed normally will be discussed in Chapter 12, but it is instructive to consider an approxi- mation to the expectation value of the range by the methods that have been introduced already. This approximation is better for large numbers of readings, but it is quite good for as few as four readings. We assume that the largest reading is just as much greater than the mean of all the readings as the smallest is less than this mean; that is, we assume that the mean of the readings is the same as the mean of the largest and smallest, although we do not use the numerical value of this mean directly. Then if R is the range, the absolute value of the residuals in the largest and smallest readings, which we assume to be equal to the errors of those readings, is R/2. By arguments similar to those used in the previous section we conclude that since (n — 2) of n readings must certainly have residuals less than R/2, 2 = 2n(P (n < |) where (P(\x\ < R/2) is the probability of observing a residual less than R/2. Since there are no single readings with either X > R/2 or X < —R/2, we assume that (P(\x\ > R/2) must be small enough such that no more than half of a reading would lie outside this limit; this corresponds to 112 METHOD OF LEAST SQUARES [10-11 20 0.18 0.16 0.14 0.12 \ \ K <t q /R 0.10 0.08 0.06 0.04 02 0.00 10 15 20 FIGURE 10-2 the argument of the previous section where we were trying to decide whether a particular reading was part of a group. That is, we write we use 2n(P(\x\ > R/2), - (P (\x\ > R/2) as a second estimate of (P(\x\ < R/2), and we set the average of the two estimates of the latter equal to Pr/2- The result is R/2 2n (10-30) For n = 10, say, P R/2 = 0.438, and reference to Table A shows that R/2<r = 1.535, or R/a = 3.07. Since the standard deviation in the mean of the 10 readings is given by 0- o = 0-/V1O , <r /R = 0.103. we have The handy reference curve shown in Fig. 10-2 was prepared in this way. The numerical results of the exact derivation given in Chapter 12 are not very different from these so long as n > 5. For n < 5, the probability of finding the range to be near its expectation value becomes so low that PROBLEMS 113 its discussion is more an exercise in mathematics than in practicality. In practice, if 10 readings were to show a range of 0.8, say, then a quick estimate of the standard deviation of the mean of the ten readings would be 0.08. PROBLEMS 1. For one unknown and equally weighted observations, show that I> 2 = EM? - (ZMif/n, where Mi is an individual observation. Note then that the average M and E v i can be computed by going through a single summation process on a desk calculator which can accumulate a factor and a product simul- taneously. 2. Ten measurements of the specific gravity of a solution gave the results: 1.0662 1.0664 1.0677 1.0663 1.0645 1.0673 1.0659 1.0662 1.0680 1.0654 Find the most probable specific gravity, the best estimate of the standard deviation for the distribution from which those were drawn, and the standard deviation in the most probable specific gravity. Answer: a = 1.05 X 10~ 3 , sp. gr. = 1.06639 ± 3.3 X 10" 4 3. For (y, a;) -data that obey y = a-\-bx show that ^2 ^2 CEyfCE * 2 ) - 2(£y)(£x)(£xy) + n(£xyf 2j> = l*y n(Zx 2 ) - (£x) 2 Note that this expression for E v2 > as wen as the one in Problem 1, involves finding a small difference between two large numbers. 4. For (y, x)-data that obey y = ax (1 + bx) show that the solutions for a and b are the same as when one treats y = ax + ex 2 and then sets b = c/a. 5. For (y, x)-data that obey b , x y = -+ - a a show that the solutions for a and b are the same as when one treats y = A + Bx and then sets a = 1/B, b = A/B. 114 METHOD OF LEAST SQUARES 6. For (y, x)-data that obey y = ax -j- bx 2 it is convenient for plotting to define Y = (y/x) . In order to treat Y = a + bx, show what weight must be given each Y in order to get the correct results for a and b. 7. For (y, a;) -data that obey y = a + bx + ex 2 find expressions for the constants that are convenient when there are an odd number n of data pairs and the values of x have an equal spacing d. (See Appendix 3. For further discussion of the fitting of polynomials, see reference 10.) Answer: For instance, ,2 15[E(12t 2 - n - l) Vi ] C(i — ■ — n( n z - l)(n2 - 4) 8. Measurements of the ordinates of points on a straight line corresponding to exactly known abscissas 4, 6, 8, 9 are made with results 5, 8, 10, 12. What is the most probable equation of the line and what is the best estimate of the standard deviation in an observation of unit weight? Answer: y = —0.288 + 1.339x, a = 0.39 9. Solve the following equations for the most probable values of x and y: x + y ± 10.0 ± 0.36, 4x — y £ 19.0 ± 0.51, 2x+3y ± 25.0 ± 0.51. (See Eq. (10-14). It is usually most convenient to give unit weight to the observation with the greatest error.) Answer: x = 5.839, y = 4.387 10. Four observations of the angle A of a triangle gave a mean of 36° 25' 47", two observations of B gave a mean of 90° 36' 28", and three on C gave 52° 57' 57". Adjust the triangle. Answer: A = 36° 25' 44.23", B = 90° 36' 22.46", C = 52° 57' 53.31" 11. The unknowns x, y, z are subject to the condition x + 2y + 3z = 36. Observations are made with the weights as noted : x = 4.3, wt. 1 ; y = 5.7, wt. 4, z = 7.3, wt. 9. What are the adjusted values? Answer: x = 3.768, y = 5.433, z = 7.122 PROBLEMS 115 12. A mass analysis is run of a sample of magnesium, and the following ratios of the concentrations of the three isotopes are observed: Mg 24 Ms 24 Mg 25 Mg 25 ' Mg 26 ' Mg 26 What is the indicated composition of the sample? Answer: Mg 24 78.7%, Mg 25 9.9%, Mg 26 11.4% 13. Oxygen will dissolve in solid uranium dioxide to give a material whose composition is described as U02+*. A theory exists by which x can be related to the existing partial pressure p of oxygen. At 1100°C the relation is expected to be logp = 2 log — 1.770a; — 5.155 a — x for p expressed in atm. Because of the method used to measure these low pressures, it is proper to consider that p is measured directly. With the following data, supposing that x is known exactly, estimate a starting value of a and then determine a least-squares correction to that value. x p (X10 9 ) 0.03 5.5 0.04 9.8 0.05 15.9 0.06 21.7 Answer: a = 1.0111 14. The capacity of a condenser is known to be 14.0 mf . It is divided into 5 sections, a, b, c, d, e, and it is known that the difference between b and d is 1.5 mf. Weighted observations on the individual sections are: Capacity (mf) wt. a 2.02 3 b 4.13 2 c 2.52 5 d 2.67 7 e 2.84 4 (a) Find the most probable capacities of the sections. (b) Recognizing that the existence of condition equations increases the num- ber of degrees of freedom, find the best estimate of the standard deviation of an observation of unit weight. Answer: (a) a = 2.008, 6 = 4.125, c = 2.487, d = 2.625, e = 2.755 (b) a = 0.156 15. Readings taken on the successive nodes in a Kundt's tube experiment were 10.1, 15.1, 24.9, 34.9, 45.0 cm. Find the most probable value of the half-wavelength of the sound in the tube. Answer: X/2 = 8.96 cm 116 METHOD OF LEAST SQUARES 16. Successive measurements of a particular quantity gave the results: 19.1, 17.0, 17.9, 20.5, 18.4, 22.0, 15.8, 16.5, 13.7, 14.9. It is desired that the standard deviation in the mean be less than unity. (a) With the aid of Fig. 10-2 decide whether any more readings need be taken. (b) Find the standard deviation in the mean and compare with the estimate made in part (a). (c) The result 22.0 seems somewhat large. Use Chauvenet's criterion to see if it should be rejected. Answer: (a) a ~ 0.85 (b) a = 0.80 (c) residual at 22.0 = 4.42, 1.96(7 = 4.98; therefore 22.0 must be kept. 17. Observations are made of the expansion of amyl alcohol with change in temperature as follows: V (cm 3 ) *(°C) 1.04 1.12 1.19 1.24 1.27 13.9 43.0 67.8 89.0 99.2 If V = 1 + Bt + Ct 2 expresses the law relating the volume and tem- perature, find the most probable values of B and C. Answer: B = 2.897 X lO" 3 cm 3 /°C, C = 1.904 X lO" 6 cm 3 /(°C) 2 18. (Warning: Parts (6) and (c) of the following problem require very extensive computation.) Given the following data on the index of refraction of fused quartz as a function of wavelength: X (10 -5 cm) n 1.936 1.560 2.313 1.519 2.749 1.496 3.404 1.479 4.340 1.467 5.086 1.462 (a) Find a method of plotting this data that would yield a straight line if it follows Sellmeier's formula, 2 1 + AX 2 71 = W^B' and estimate A and B. (b) Supposing that the observations of n are equally weighted, use the method of Section 10-7 to find least-squares corrections to the values of A and B found in part (a). (c) Supposing the observations of n are equally weighted, use the form of equation found in (a) in order to compute A and B directly without the method of Section 10-7. Answer: A = 1.0971, B = 0.8750 X lO" 10 cm 2 PROBLEMS 117 (Note. The exact results will depend on the number of figures carried and on round-off errors, which will vary according to decisions made. The authors carried four decimal places in every column. The results given above were for part (c). With one pass in part (b), i.e., without a recalcula- tion of new corrections to those found the first time through, the authors found A = 1.0972, B = 0.8747 X lO" 10 cm 2 .) 19. Two criteria for the rejection of observations are given in the text, one based on a fixed probability, and Chauvenet's criterion. At what numbers of observations are the two equivalent when the probability used in the first is the 0.1% used in the text, 1% and 5%? Answer: P n 0.001 500 0.01 50 0.05 10 CHAPTER 11 PROPAGATION OF ERRORS In the last chapter we discussed methods of determining those values of the desired unknown quantities which are rendered most probable by the set of data in hand. The methods are general enough so that they are applicable to arbitrary numbers of unknown quantities related to one an- other via observation equations that are linear or can be made so. On the other hand, the degree of exactitude to which an unknown has been determined was only discussed for one unknown. It is the principal business of this chapter to rectify this situation. To do so we must begin by dis- cussing the general subject of the propagation of errors. The purpose of treating this latter subject is to answer the question, "Given some set of numbers and their errors, what is the error in some prescribed function involving these numbers?" As a very simple example, consider the problem of determining the volume of a right circular cylinder from measurements of its diameter d and height h. The volume is given by V = \ird 2 h. If we could specify the maximum possible error Ad in d and the maximum possible error Ah in h, then we could easily calculate the maximum possible error AV in the volume. Using the methods of Chapter 3, we would find the maximum error in the volume to be AV = ~Ad + ^Aft = ldhAd + ~d 2 Ah. (11-1) dd on, I 4 However, as we saw in Chapter 7, it is usually not possible to specify a maximum error, i.e., an error which is guaranteed never to be exceeded. The standard deviation, on the other hand, can be estimated with reasonable ease. As a measure of the spread, it represents an error that has a 32% chance of being exceeded; this can be seen by referring to Table A at the end of the book. Since a maximum error cannot be found, or even defined, we must calculate the standard deviation a v in the 118 11-1] GENERAL PROBLEM 119 volume in terms of the standard deviations 0^ and o^ in the height and diameter respectively. To find the relationship between try, 0^ and da, we start with the fundamental definition of the standard deviation a of the individual readings from an infinite parent distribution: a = lim */— — Usually h and d are each obtained from the average of a series of readings so that our final interest centers on the standard deviations of the averages. Furthermore, it is necessary to recognize that the "true" values of h or d are as likely to be less than as to be greater than the averages used for the values of h and d. Hence it is not certain whether the effect of each error is to increase or decrease the error in the volume; this, too, is a statistical matter. The above example was introduced as an illustration of the problem; rather than proceeding to its specific solution we shall consider the general problem and calculate relationships that can be used for all special prob- lems. 11-1 GENERAL PROBLEM In the general problem of the propagation of errors we shall assume that G is to be calculated from measurements, designated as M 1} M 2 , . . . , M r . It will be convenient if the reader accepts as the meaning of the symbol M i some one of the variables of which G is a function and also, at the appropriate points, a particular value of that variable. If G is a function f(Mi, M 2 , . . . , M r ), we mean by df/dMi, the partial derivative with respect to the variable Mi and that this derivative is then to be evaluated at some particular set of values of all the variables including M ». We are concerned in this book with the handling of numbers; the derivatives then are useful only after they have been evaluated. Let us assume that a single measurement has been made of each of the Mi's and that the true errors are xi, x 2 , . . . , x r in M\, M 2 , . . . , M r respectively. The true error X in G is then given approximately by x = Ml* + &1** + ■ ■ • + sk 1 " (11 " 2) provided that the x's are not too large. If another set of measurements are made of the Mi's with true errors x[, x' 2} . . . , x' n respectively, the corre- sponding error X' in G will be v — d/ , , df , df , X ~ dAfi Xl + dM 2 X2 + " " + dM~ r Xr - 120 PROPAGATION OF ERRORS [11-1 Suppose that this process is continued until n observations have been made on each of the Mjs. The standard deviation for the individual values of G is then given by <r' G 2 = limS^!, n— »oo W where the prime on the <t'g indicates that it is an estimate of the standard deviation of a single reading of G. There are n equations similar to Eq. (11-2). The squares of these equations are x * - (&*•)' + Gib) 2 + --- + 2 m £-,»* Z " = tt*l)' + (^) , + + 2 mi Si ** + 2 rnim «« + -• etc - If we add these equations and divide by n, we obtain a ^ _ E* 2 _ / a/ V (Z4\ , ( ef Y» /E*i\ , 9 3/ a/ f T,xix 2 \ , a/ a/ / I>i* 3 \ , ni _^ + 2 aMTaM^V^~/ + aMxaiifsV n J + '" {U 6) Each of the terms that are to be added in the evaluation of YL X \ > H^i> e tc, being a square, is positive. Thus the quantities (JLx\)/n, (J^xl/n), etc., rapidly approach constant values as n approaches oo. The terms which are to be added to calculate £zi£2> HxiXz, etc., however, are just as likely to be positive as negative. Hence the quantities (22xiX 2 )/n, (51% i%3) / n etc., rapidly approach zero as n approaches 00. Thus in the limit as n approaches infinity, Eq. (11-3) becomes *i = (mtf + (mtf + • • • + GtW 2 • <"-» Equation (11-4) gives the relationship between the standard deviations of the infinite parent distributions, but we are most interested in un- certainties of the final results, the averages of a finite number n of the readings. For n readings on each of the Mi's ,r' 2 «-' 2 /r' 2 2 &g 2 &\ 2 °2 , an = — — y 0i = — - ' 02 = — > etc., u n n n 11-1] GENERAL PROBLEM 121 where the unprimed values of a are for the averaged G, M i} M 2 , etc. Therefore Equation (11-5) is the expression generally used for propagation of errors and is one of the most important equations in the whole subject of physical measurements. Although it is derived here on the assumption that each quantity is measured the same number of times, the result is independent of this assumption. Equation (11-5) may now be applied to the problem of the volume of a right circular cylinder with which this chapter was introduced. We see that the result will be similar to Eq. (11-1) if AV, Ad, and Ah are replaced by <iv, o~d, and o^ respectively, and each of the terms squared; i.e., 2 (icdh V , (ird 2 V (H-6) If for a cylinder of the given size we substitute for Ad and Ah in Eq. (11-1) the values of a d and <Th respectively, we will obtain a value for AV which will necessarily be greater than the value of o>. This fact can be illustrated graphically as in Fig. 11-1. The hypotenuse <j v of the right triangle is less than the sum AV of the lengths of the sides. rhd Ad- -aV- ird 2 Ah- FIGURE 11-1 People not trained in statistical procedures often use equations like (11-1) instead of (11-6) when it is the latter that they should use. An equation like Eq. (11-1) always gives a more pessimistic statement of the propagated error than does Eq. (11-6). Although it is better to be too pessimistic than to be too optimistic in reporting the magnitude of the errors in one's experimental results, it is a disservice to one's colleagues to report a much larger error than is warranted. The scientist must build 122 PROPAGATION OF ERRORS [11-2 on the works of others and use the building blocks developed by others. Blocks that are thought to be badly misshapen are as useless as ones that are truly misshapen. Moreover, AV calculated from Eq. (11-1) cannot be described as a standard deviation. The reason for the difference between these two types of equations is obvious when one looks at the derivation of the second equation, where the cross product terms in Eq. (11-3) were dropped because they were as likely to be positive as negative. Thus equations like Eq. (11-6) take advantage of the fact that there is a high probability that the errors in the diameter and height will tend to cancel each other. 11-2 SPECIAL CASES While Eq. (11-5) has a general applicability to all problems of this type, a number of special cases occur so frequently that it is worth while remembering special formulas for them. Two observations M x and M 2 are required in determining a length with a meter stick, and the length is L = Mi — M 2 . From the general equation (11-5) we see that a\ = <j\ + <r|, and since the two observations are of the same kind, <T\ = <r 2 so that a L = <rV2. (11-7) Other special cases which arise frequently are: G — Y,Mi for which al = J>?, (11-8) and G = J^atMi for which <t 2 g = Ea?cr?. (H-9) It should be noted that this last relation applies to weighted averages in which the average is given by _ WiR i + w 2 R 2 + for which 2 + (i§) 2 + From Eq. (10-14), we find that <j\ = <r 2 /wi, a\ = <J 2 /w 2 , etc., where a 2 is the standard deviation of the observation of unit weight. Hence (j\ — (j^/Y^w. This last relation could be obtained directly from Eq. (10-14), since the weight of the average A is £>. One of the most important special cases is that in which G = kMiM b 2 M c 3 • • • (k = const). (11-10) 11-2] SPECIAL CASES 123 This type of equation occurs very frequently, and the resulting special formula is so simple that it can be remembered and used for mental calcula- tions. If one applies the general equation (11-5) to the case of three variable factors, for instance, he gets ul = (aMT^MlMlatf + (bMlMl^M^) 2 + (cM^M^rV 3 ) 2 . If the left-hand side of this relation is divided by the square of the left- hand side of Eq. (11-10) and the right-hand side by the square of the right-hand side of Eq. (11-10), the result is 6?Mg)*+eK)"+te)" (11-11) The quantities <tG/G, cri/M 1 , etc., are often called the fractional errors in G, Mi, etc. Since the cr's are proportional to all other measures of spread, it is obvious that any of the equations that have been developed here can be used by merely replacing the <r's by any other measure of spread such as p, a.d., etc. The relation given by Eq. (11-11) is most frequently used for calculating percentage of error in problems where the relation between the measured quantities is given by Eq. (11-10). Consider the problem of the volume of a right circular cylinder given in the introduction to this chapter. It can be seen that if the spread in the errors amounts to one percent for both the diameter and the height measurements, then the percentage of error in the volume is Vl 2 + 22 = 2.2%, an operation which can be performed mentally. When propagation-of- error problems involve equations like (11-10), by far the simplest method of calculating is given by Eq. (11-11). At this point the reader should be warned against a very common mistake; i.e., not making sure that the various observed quantities Mi, M 2 , etc., are the directly measured quantities and are not dependent on one another. An excellent example of this mistake is found in a com- mon college physics experiment in which the index of refraction of a slab of glass is obtained by means of a microscope with a scale indicating the vertical position of the microscope tube. In carrying out the experiment one places the slab on the microscope stage and observes the positions Mi and M 2 of the microscope when it is focused on the top surface and the bottom surface of the slab, respectively. A third observation M s is obtained by observing the position of the microscope tube when the slab of glass is removed and the microscope is focused on the stage. The index of refraction is given by n = {M x — M 3 )/(Mi — M 2 ). The solution for 124 PROPAGATION OF ERRORS [11-3 H obviously contains three terms, and the term containing a x will be found to be quite small because errors in M i have a relatively small effect on the index of refraction. This is true because \l Mi is large, the numerator and denominator are both large, and the change in the value of ju is small. New students often write this equation as \i = t/d and apply the simple relation of Eq. (11-11). They fail to recognize that since t = Mi — M 3 and d = Mi — M 2 , t and d are not independent and greater accuracy is available than they have obtained. Because Eq. (11-11) is so simple to apply, it should be used wherever it is applicable. Where it is not applicable, the general Eq. (11-5) should be used. 11-3 STANDARD DEVIATIONS OF UNKNOWNS CALCULATED FROM OBSERVATION EQUATIONS The general method of calculating standard deviations given by Eq. (11-5) can be applied to determine the standard deviations for the unknown quantities, which we calculated by the method of least squares in Chapter 10. For the most general case consider the normal equations for weighted observations given by Eqs. (10-16). The solution for the typical unknown, B for instance, is where B = A = [waa] [waM] • - [waq] [wab] [wbM] [wbq] [waq] [wqM] ■ ■ [wqq] [waa] [wab] • • • [waq] [wab] [wbb] [wbq] [waq] [wbq] • • • [wqq] (11-12) (11-13) If we expand the numerator of Eq. (11-12) in terms of the minors of the second column, we can write B as B = $i[waM] + P 2 [wbM] -\ 1- q [wqM], where, for instance, (11-14) [wab] [wbc] • • [wbq] [wac] [wcc] [wcq] [waq] [wcq] - • [wqq] (11-15) 11-3] STANDARD DEVIATIONS: OBSERVATION EQUATIONS 125 We see then that the j8's are independent of the observations, all the latter of which are contained in the sums [waM], [wbM], etc. Obviously, Eq. (11-14) can be written in terms of the M's since they enter into that equation in the form of summations. That is, B = t 1 M 1 + r 2 M 2 + • • • + r n M n , (11-16) where the coefficient T\ of Mi is the sum of the first terms of each of the summations. Therefore Ti = jSiWiOi + fowi&i + fowiCi + • • • + PqW\q\- Similarly, the coefficient of M 2 is the sum of the second terms : T 2 = jSl«?2«2 + 02W2&2 + /?3«>2C2 + ' ' • + Pq™2<l2- Thus we obtain the solution for B in terms of the observations M 1} M 2 , etc., so that we can apply the general equation (11-5) for propagation of error and get A = (tkti) 2 + (r 2 <7 2 ) 2 + • • • + (r n (r n ) 2 . (11-17) From Eq. (10-14), however, we find a\ = <t 2 /wi, etc., where as usual <j is the standard deviation of an observation of unit weight. Thus we can write Eq. (11-17) as a% = a 2 (* + li + •••) = H a 2 . (11-18) It will be shown in Appendix 4 that tt~\ 'J w ■ ~ P2 ' where /3 2 is the coefficient of [wbM] in Eq. (11-14). Since <r% = g 2 /wb, obviously wb = 1/182- This means that the simplest way to calculate the weight of B is to obtain a numerical value for /3 2 by merely keeping the summations [waM], [wbM], etc., in algebraic form, during the process of solving for B, long enough to obtain numerical values for their coefficients. This generally can be done without any extra labor at the time the original normal equations are solved for B. To obtain the weights of A, C, etc., the normal equations should be solved in the usual manner, keeping the summations in algebraic form. The results will be A = eti[waM] + a 2 [wbM] + a 3 [wcM] + • • • , B = PilujaM] + PdwbM] + MwcM] -\ , (11-19) C = 7i[woM] + y 2 [wbM] + y s [wcM] + • • • , etc. 126 PROPAGATION OF ERRORS [11-4 The important coefficients are a lt /3 2 , and 7 3 , which are underscored in Eqs. (11-19). It is shown in Appendix 4 that 1 1 , 1 wa = — > w B = -5- » and wc = — • «1 P2 ^3 In addition to the determination of the errors in the constants A, B, etc., it is often desirable to know the error in the value of the function itself at some particular set of values of the independent experimental variables. This problem has been discussed by Birge [4] ; as he states, the error in the constant term of the equation for a straight line is the error in the function when the independent variable is zero. The extension to the case of a power series is easy. In any other case, the general principle of Birge remains valid. We shall reproduce Birge's argument here. If we wish to have the error at some value of the independent variable x, e say, we need only make what is called a linear transformation; that is, shift the origin of x. Let x' = x — e. Then the desired error is the error in the new constant in the equation. If we write out the solution by determinants for the constants and compared it with Eq. (11-19), we see that a\ = A _1 X>;a:!. Hence the reciprocal of the weight of the new constant is a\ = A~ 1 Y,w i (x i — e) 2 , in which, as the reader should show, the value of A does not change. Hence o-J(e) = (r^-^Wiixi - e) 2 . 11-4 AN EXAMPLE Before going further into more complex situations, it would be well to tie down some of the above discussion with an example. For this purpose, we shall continue the example of Table 10-2. We assume that every observation yi is a single reading, and all readings have the same weight. All the calculations are made with Eqs. (10-16) for q = 2 and unit Wi, and using in Eqs. (11-19) a x = A~ 1 Y,WiX ! i and /3 2 = A -1 £w>i- We then obtain the result o- = 3.88, (T B = 4.57, o-a = 3.91, <j z (1) = 2.63, 11-5] CONSISTENCY METHODS: ONE UNKNOWN 127 where the last standard deviation, as is indicated, is that of the value of z, z = -2.49 + 68.08X, when x is unity. 11-5 INTERNAL AND EXTERNAL CONSISTENCY METHODS OF CALCULATING STANDARD DEVIATIONS FOR ONE UNKNOWN It frequently happens in the determination of an unknown quantity that there is a series of values each of whose errors represents the average of a set of single determinations. This situation could arise either as a result of work by one individual or a group of individuals, or as a result of attempts to combine the results of several distinct groups of workers. In the former case almost all of the original observations are available, whereas in the latter case only those averages and errors that have been given are available. If all the individual readings used to find each of the averages are known and if they are all of the same accuracy, then they should be combined and recalculated to give a single average; for all the readings came from the same universe, i.e., they all fit the same error curve, and the total number of readings allows a more precise estimation not only of the ex- pectation value of this curve but also of its precision index via the deter- mination of the associated standard deviation in a single reading. If the numbers of readings are not known, or if the readings were taken with different instruments which have different error curves associated with them, then we must assume that the averages are the result of readings drawn from different universes, which may have different precision indices and, unfortunately, different expectation values because of sys- tematic errors. Each average constitutes a single point drawn from what may be yet another universe, whose properties must be investigated. Here, the precision with which one can make this investigation is not as great, since there are available only the number of points equal to the available number of averages. Nevertheless, within limits, by considering the problem in the proper way, one can decide whether the various averages appear to differ from one another only as much as we would expect on the basis of the random errors involved in the determination of one of them, or whether the differences seem to be greater than this. In the latter case, we would suspect the existence of systematic errors. Such errors may be due to poorly zero-adjusted meters, instrument cali- brations different from the scale markings, or something much more subtle. An excellent introduction to this problem has been given by Birge [4]; the problem has also been discussed by Topping [11]. If we are given n 128 PROPAGATION OF ERRORS [11-5 readings on a particular quantity, the best estimate of the standard deviation in one of these readings, i.e., the best estimate of the standard deviation associated with the universe of readings from which these n were drawn, is ° = y^i) ,,2 > (11-20) in which Y,v 2 is the sum of the squares of the differences (residuals) between the readings and their mean. The mean itself was determined by the use of n observations. Thus its weight is n compared with unit weight for one of the observations, and, by Eq. (10-14), its standard deviation is <r = -^- (H-21) n 1 ' 2 As Birge points out, Eq. (11-21) is a prediction of the standard deviation that would be found if enough observations of the mean were made so that the error of the mean could be found from an equation like (11-20), and if the same error curve held for each such observation of the mean. Thus suppose that N groups of n readings each are taken so that all the means have the same weight, and suppose that N means are found. Then the calculation of <t from "o = (^l)" 2 , (H-22) where £F 2 is the sum of the squares of the differences between the individual means and their mean, should be expected to nearly reproduce the result found from Eq. (11-21.)* Such will be the result if all Nn readings are truly taken from the same universe, that is, they are taken with the same instruments by the same people under the same conditions.! There is complete consistency of the observations under these conditions. If, on the other hand, the various groups of observations are obtained by different people under varying circumstances and with different instru- ments, then such agreement is not to be expected. It is clear then that there is consistency only within individual groups. When one makes the * As is discussed by Birge [4], there is an error associated with the determination of the standard deviation as well as the mean. The relative standard deviation in the standard deviation is 1/V2n. If a is the standard deviation in a single reading of which n have been taken, <r/(n\/2) is the standard deviation in the standard deviation of the mean. See Chapter 12. t Actually, this is a poor way of obtaining so many readings, for if there is some constant error in an instrument, such a procedure will never reveal it. It is much preferable to take many averages from many sources. 11-5] CONSISTENCY METHODS: ONE UNKNOWN 129 assumption of internal consistency, he assumes that the precision index for all the groups is the same and estimates this by an appropriate averaging process applied only to the precision indices, and not to the location of the corresponding distribution peaks. On the other hand, if one assumes that each group is drawn, effectively, from the same universe, he is assuming the existence of external con- sistency between the means of the separate groups. When he properly applies an equation like (11-22), in which the individual group means are used, he will get agreement with his previous result if this latter assumption is correct. The availability of many averages from many sources has its maximum usefulness when each of the averages is accompanied by its standard deviation and the number of readings involved. Failing this requirement, one can proceed, as indicated earlier, with the aid of the errors alone to judge whether all the averages came from the same universe.* When the reported averages are accompanied by standard deviations or other measures of spread, we can determine the relative weights by Eq. (10-14), j Wi _ <J_2 \W 2 <T\ We will frequently find it convenient to assign unit weight to the reading with the largest error, though this is not necessary. The standard deviation of the final result is also calculated from Eq. (10-14). Since the weight of the final result is equal to the sum of the separate weights of the individual averages, *o = f>' (H-23) where a is the standard deviation of the particular average that has been assigned the unit weight. This constitutes the averaging technique men- tioned earlier with which one uses information from each group to estimate the spread inherent in any one of them, which is assumed to be the same for all. Equation (11-23) then gives the standard deviation calculated on the assumption of internal consistency. That is, the <r's and w's are all determined from the spread of the individual sets of observations. The (T of Eq. (11-23) is a function of the a's of the separate averages; the o-'s depend only on the consistency of the readings making up those averages. The o~ does not depend at all on how well these averages agree with one another. * With only the standard deviations and not the numbers of readings available one cannot decide whether the possibly different universes have different widths, i.e., whether a different standard deviation of single readings is to be associated with each of them. 130 PROPAGATION OF ERRORS [11-5 As mentioned earlier, it often happens that these averages do not agree with one another nearly as well as we might expect from their individual a's; either the quantities measured are not identical or there are sys- tematic errors in one or more of these averages. The best way to make an organized examination of the extent of an apparent disagreement is to calculate the standard deviation by the method of external consistency and compare it with the result of Eq. (11-23). The equation to be used is (10-28), which is more general than Eq. (11-22). In Eq. (10-28) the weights w are found as for the internal-consistency calculation, and n is the number of groups. The residuals are between the weighted mean and the group means. The calculation of weights in the same way as before corresponds to the assumption, made here also, that all the universes from which the different groups are drawn have the same precision index. Birge [4] discusses numerical criteria on the basis of which one can decide whether an observed departure from equality of the results of Eq. (10-28) and Eq. (11-23) is or is not greater than one should expect by chance. Here, we shall defer the discussion of such criteria until the next chapter. For our present purpose, it will suffice to say that if there is a very large difference, we must consider the groups to be inconsistent. An example is to be found in a compilation of thermodynamic data on the alkali metals.* In Table 8 of this paper several values of the heat of vaporization of potassium were given, and some of these were starred. The starred values were those given the greatest weight by the authors of the compilation. With the exception of one which had no probable error attached, they will be used here as an illustration. The errors given in the paper are probable errors in the listed means of the results by individual investigators. Here, in Table 11-1, they have been converted to standard deviations by the use of Eq. (9-7). The assigned weights are inversely proportional to the squares of the errors. The reading of unit weight has <r = 0.0163 kcal/mol. It is apparent that there are systematic differences between the different observations which are much greater than the precision of any one of them. This can be seen by examining the residuals in the last column and by comparing the standard deviations as calculated by the two methods. The one calculated by external consistency is ten times as great as the one calcu- lated by internal consistency. In a case like this, where it is quite clear that the individual averages differ by more than one should expect, it is probably best to recompute the values of the mean and its standard deviation by treating each number as a single observation of unit weight. An error which creeps in through some unrealized effect or miscalibration, for instance, cannot be corrected * W. H. Evans, et al., J. Research Natl. Bur. Standards, 55, 83 (1955). 11-5] CONSISTENCY METHODS: ONE UNKNOWN 131 Table 11-1 HEAT OF VAPORIZATION OF POTASSIUM Atf°o, kcal/mol O"0 Weight w \AH° - AH° \, kcal/mol 21.745 21.709 21.663 21.762 0.004 5 0.001s O.OI63 0.003 13 118 1 30 0.023 6 0.012 4 0.058 4 0.040 6 (2>A#S)/(2» = 21.721 4 = AHl 0.0163/V& = O.OOI3 Vl>(A#S - AHZ) 2 /$Zw = 0.013 merely by taking more readings. In the present example, theoretical considerations* have lead to the belief that it is the first entry in Table 11-1 which is correct rather than either of the two with larger weights; such additional information is not always available, however. A very famous case of a similar nature was a conflict between the value of the electric charge on the electron as observed directly by measuring the rate of movement of a charged drop of oil in air between two charged plates and indirectly by the accurate measurement of the distance between the atoms in a crystal by the use of x-rays. The resolution of the con- troversy is described by Birge.f The source of the discrepancy turned out to be a systematic error in the direct observation; an erroneous value had been used for the viscosity of air. An instructive quotation can be taken from an article by DuMond and CohenJ on the least-squares evaluation of various physical constants, "Before the oil-drop value of e had been corrected by revisions regarding the true viscosity of air, the [x-ray method] was questioned since it led to a value of e . . . then supposed to be too high. This was a fortunate thing since it led to a great deal of very careful and critical examination of all possible sources of error, both theoretical and experimental, in the above (x-ray) method ..." Intermediate cases involve difficult decisions. When there are only two averages, a fairly good criterion is to consider the averages inconsistent if they differ by more than twice the sum of the separate <r's, and consistent if they differ by less than this quantity. * R. J. Thorn and G. H. Winslow, J. Phys. Chem., 65, 1297 (1961). t R. T. Birge, Phys. Rev., 48, 918 (1935). % J. W. M. DuMond and E. R. Cohen, Rev. Modern Phys., 20, 82 (1948). The determination of the best values for the fundamental physical constants was discussed more recently by E. R. Cohen, J. W. M. DuMond, T. W. Layton, and J. S. Rollett in Rev. Modern Phys., 27, 363 (1955). 132 PROPAGATION OF ERRORS [11-6 When two averages are found to be inconsistent by the above criterion, one should ignore the individual a's and consider the two averages A\ and A 2 to be of equal weight, as was suggested above for the case of several averages. The best value is then the arithmetic mean, A = \(A\ + A 2 ), and the <r for this mean is given by Since and (To vi = A- 2 "11/2 n(n -1)J At — A 2 2 v 2 = A 2 — A = A 2 — A t we find that 2 v 2 2 i 2 n(A\ — A 2 \ ^r v z = Vl + v 2 = 2 I 2 J • Therefore (r = £|Ai — A 2 \. 11-6 INTERNAL AND EXTERNAL CONSISTENCY METHODS FOR CALCULATING STANDARD DEVIATIONS FOR MORE THAN ONE UNKNOWN In Section 10-3 (supplemented by Appendix 4), methods were given for determining the weights of the unknown quantities which were calculated by the method of least squares using observation equations. The discussion included the possibility of using weighted observations, that is, using two contributions to the weights of the unknown quantities. One of these is unequally weighted observations, represented by w in such sums of products as [waM]; the other contribution is the actual distribution of the data. For example, in a straight-line case where M = A + Bb, if B and b are such that all the observations of M are far from A, the weight of A will be reduced as compared with the case of observations which are made near b equal to zero. The determination of these latter weights is a separate problem from the determination of the standard deviation of an observation of unit weight, and, in fact, is the problem discussed in Section 10-3 and Appendix 4. If many observations of M are taken at each of several values of 6, they can be averaged, group by group, and standard deviations deter- mined for each of these several values of M. Weights can then be assigned to each of these values of M which are inversely proportional to the squares of their standard deviations, the unit weight being assigned to the value 11-6] CONSISTENCY METHODS: MORE THAN ONE UNKNOWN 133 which has the largest standard deviation. These weights are the values of w in equations such as (10-16), and the largest of the several standard deviations is the standard deviation of an observation of unit weight when errors in A and B are to be calculated on the basis of internal consistency. If, on the other hand, errors in A and B are to be calculated on the basis of external consistency, the standard deviation in an observation of unit weight is to be calculated from Eq. (10-26), and the values of w to be used in this latter equation are still those described above. A complete solution of the two-parameter example is then given by the following equations : B wM] [wb] [wbM] [wbb] [w] [wM] [wb] [wbM] [w] [wb] [wb] [wbb] wa = A /[wbb], w B = A/[w], <Te = [wvv]/{n — 2), _2 _ 2 "i — °max> va = (o"e or <J t )/\/w A j (Tb = (o"e or (Ji)/VWB~. (11-24) (11-25) (11-26) (11-27) (11-28) (11-29) (11-30) (11-31) (11-32) Equations (11-24) through (11-26) follow from Eqs. (10-16). Equations (11-27) and (11-28) follow from Eqs. (11-19) and Appendix 4. In Eqs. (11-29) and (11-30) the subscripts e and i on the left-hand side designate, respectively, external and internal. Equations (11-31) and (11-32) follow from Eq. (10-14). In Eq. (11-29) the residuals v are the differences between the observed values of M and those calculated with the values of A and B as given by Eqs. (11-24) and (11-25) respectively, and n is the number of values of M. In Eq. (11-30) <r max is the largest of the errors associated with the individual values of M . The errors <r e and <r; are the standard deviations of an observation of unit weight for these two consistency assumptions. It is not difficult to show that when the weight of the observation M k is written as w k = C/al, (11-33) 134 PROPAGATION OF ERRORS [11-6 the result of Eq. (11-29) is independent of the value of the constant of proportionality C. On the other hand, Eq. (11-30) implies that C is °"max- This choice is made because it is the most convenient; as a result, the weights will generally be of a reasonable size regardless of the size of the o-fc's, and an observation of unit weight is brought into evidence. Perhaps it should be added that if C in Eq. (11-33) is set equal to unity, then (Ti is also unity, and Eqs. (11-27) and (11-28) are directly the reci- procals of the squares of the standard errors in A and B, respectively. Let us investigate the following example. It is especially illustrative because it is real rather than made up and because it differs in one im- portant respect from the case described above rather than in spite of it. The difference is that the authors of the source of the example did not make many observations of M at a single value of 6. Instead, they attempted to estimate standard deviations in the various quantities which entered into the calculation of the individual values of M . One might say that an attempt was made to estimate any further systematic errors which presumably could have been removed if they were sought and found. It will be seen that, fortuitously or not, the attempt appears to have been reasonably successful. Table 11-2 GRAPHITE VAPOR PRESSURE DATA Experiment 1 Experiment 2 10 4 /T —log p* 10 4 /T —log p* 4.1339 4.0437 4.0290 4.0258 4.641 db 0.019 4.323 ± 0.024 4.266 ± 0.026 4.256 ± 0.026 4.0634 4.0437 4.0000 3.9777 4.460 ± 0.030 4.381 ± 0.032 4.231 ± 0.038 4.159 dz 0.042 * The values of M = — log pressure used for our present purposes were not properly weighted to maintain the proper weights of the directly observed quantities (see Section 10-7); this is a complication which does not contribute to the matter being illustrated. The example consists in the measurements of the vapor pressure of graphite by condensing a known fraction of the vapor emitted during a known time interval from a source at a known temperature, f The values to be used here are given in Table 11-2. Two different experiments were f Argonne National Laboratory Report ANL-4264, The Vapor Pressure and Heat of Sublimation of Graphite, by O. C. Simpson, R. J. Thorn, and G. H. Winslow (1949, unpublished). These authors did not put straight lines through the particular data which are being used for the present example. 11-6] CONSISTENCY METHODS: MORE THAN ONE UNKNOWN 135 3.98 4.00 4.02 4.04 4.06 loyr 4.08 4.10 4.12 4.14 fig. 11-2. The "error bars," of total length twice the standard deviation given for each point in Table 11-2 represent the effects of the standard deviations estimated by the observers to be applicable to auxiliary measurements. It is clear that the precision is great enough that it alone would not prevent the discovery of the source of the systematic difference. done, and we give the values of the negative logarithm of the pressure (M) in mm Hg for various values of reciprocal absolute temperature (6) for both. The results of treating these data according to the methods outlined in this section are given in Table 11-3. The errors obtained with the external consistency calculation are small compared with the internal consistency calculation; this shows that the random error in the experiment is smaller than the estimated errors in the measurement of auxiliary quantities, such as distances from source to target, used to compute p from the observed 136 PROPAGATION OF ERRORS [11-7 Table 11-3 COMPARISON OF CONSISTENCY ASSUMPTIONS Consistency Assumed Experiment 1 Experiment 2 A B A B Internal External —10.07 ±0.96 -10.066 ±0.095 3.56 ±0.24 3.558 ±0.023 -9.8 ±2.1 —9.82 ±0.41 3.51 ±0.53 3.512 ±0.103 rate of deposition of the graphite vapor. Intercomparison of the numbers in Table 11-3 and the graphic comparison in Fig. 11-2 indicate that the rate of change of p with temperature was determined more precisely than the absolute value of p at any one temperature. That is, the slopes of the two lines agree quite well; it is the intercepts which are in disagreement. In the temperature region in which the data were taken, the lines are separated by amounts comparable to the estimated errors in In p. This does not prove that there was no unrecognized systematic difference between the two experiments, but the odds favor the argument that better measurements of the auxiliary quantities would remove the systematic difference. 11-7 REJECTION OF OBSERVATIONS: MORE THAN ONE UNKNOWN Table VI in the report* from which the example of the previous section was taken provides an example in which we can examine the process of looking for excessively large residuals when there is more than one quantity to be determined by the method of least squares. The pertinent data from that table are reproduced here in the first three columns of Table 1 1-4. We extend the procedure outlined in Section 10-10 to this case by using the criterion with which we calculate the probability of occurrence of a residual of a particular size from the size of the sample. The present example is a straight-line case; two unknowns are to be determined. It is assumed that the observations are externally consistent, and the procedure amounts to an examination of the validity of that assumption. The progress of the examination is shown in Table 11-4. The cr calcu- lated for each trial is that of Eq. (11-29). After the first trial it is found that for Run 1, 0.0753 > 0.0722; this run is discarded. After the second trial it is found that for Run 2, 0.0576 > 0.0546; this run is also discarded. O. C. Simpson, R. J. Thorn, and G. H. Winslow, loc. cit. PROBLEMS 137 Table 11-4 Run Vii io 4 /r VI V2 V3 12 2.425 3.973 —0.0426 —0.0245 —0.0181 11 2.520 4.000 —0.0250 —0.0085 —0.0024 1 2.453 4.008 +0.0753 10 2.685 4.044 -0.0066 +0.0074 +0.0131 3 2.787 4.063 -0.0294 -0.0165 -0.0110 9 2.964 4.108 -0.0188 -0.0085 -0.0035 2 3.087 4.154 +0.0500 +0.0576 8 3.262 4.186 +0.0084 +0.0141 +0.0184 4 3.427 4.223 —0.0024 +0.0013 +0.0052 7 3.583 4.266 +0.0209 +0.0220 +0.0255 5 3.703 4.294 +0.0176 +0.0171 +0.0204 15 3.824 4.316 —0.0117 —0.0134 -0.0104 6 3.863 4.333 +0.0202 +0.0175 +0.0203 14 3.972 4.350 —0.0180 —0.0216 —0.0189 13 4.054 4.365 -0.0374 —0.0420 -0.0394 First trial I: yn = — •14.1798 + 4.1687 (10 4 /T); o r = 0.0339 2 X 15 — 1 X — — = 0.4833: — = 2.13: X = 0.0722 4X 15 ' a Second trial: yu — —13.9325 + 4.110 (10 4 /r); a = 0.0260 (2X 14 - 1)/(4X 14) = 0.4821; X/a = 2.10; X = 0.0546 Third trial :yn = - -13.8876+4.1013 (ltf/T); <t = 0.0202 (2 X 13 - l)/(4 X 13) = 0.4808; X/a = 2.07; X = 0.0418 Conclusion: yu = - -13.888 ± 0.174 + (4.101 ± o.o4i) (io 4 /r) After the third calculation it is found that there is no residual with an absolute value greater than 0.0418. At this last trial, we see that when a residual is so large that its probability of occurrence is less than 3.8%, we can assume that its existence invalidates the assumption of external consistency. This is more stringent than the 5% level which is often used for this sort of consideration. The 5% level occurs at P x = 0.475, X/a = 1.96, X = 0.0396 for the third trial. While there is also no residual greater than this, the one at Run 13 is essentially equal to it. PROBLEMS 1. Multiply 630.45 ± 0.62 by 25.635 ± 0.024 and give the complete result. Answer: 16162 ± 22 2. Add 21.42 ± 0.61, 338.161 ± 0.042, and 543.1 ± 1.5 and give the com- plete result. Answer: 902.7 ±1.6 138 PROPAGATION OF ERRORS 3. The dimensions of a right circular cylinder are found to be: length 12.183 ± 0.024 cm diameter 4.242 ± 0.021 cm where the errors are standard deviations. Find the volume and its standard deviation. Answer: 172.2 ±1.7 cm 3 4. A rectangular steel rod of width b and depth d is supported at its ends and loaded at its center by a weight W. If the length of the rod between its supports is I, and a is the deflection at the center, then _ Wl 3 a ~ 4Ebd3 ' where E is the modulus of elasticity. Measurements give 6 = 8.113 ± 0.042 mm, d = 10.50 ± 0.025 mm, I = 1.000 m precise to 1 in 5000, a = 2.622 mm precise to 0.25%, W = 2 kg precise to 0.02 g. (a) Compute the modulus of elasticity. (b) Assume that each measure of spread is a standard deviation and calculate separately the percent spread each would produce in E, if the other spreads were negligible. (c) Calculate the standard deviation in E due to all components. Answer: (a) 2.102 X 10 9 gm/cm 2 ; (b) W, 10- 3 %; I, 6 X 10- 2 %; a, 0.25%; b, 0.52%; d, 0.71%; (c) 1.9 X 10 7 gm/cm 2 5. Show that the results of Eqs. (11-24) through (11-29) are obtained cor- rectly if each original observation equation is multiplied through by the square root of its weight and the set is then treated as being equally weighted. Is this true for the general linear problem? 6. For (y, a;)-data which satisfy y = ax + bx 2 , where the observed values of y are equally weighted and the values of x are known exactly, show that the equation can be viewed as the result of applying the procedure of Problem 5 to data of the form (y/x) = a-\- bx, weight x 2 . 7. The quantities n and k, to be combined into _ Ank V = (n + 2)2 ' are found to have the values 2.00 ± 0.04 and 0.250 ± 0.005, respectively, where the errors are the standard deviations. (a) Without introducing any constant of proportionality, find the product of y and the square root of its weight. PROBLEMS 139 (b) If A = 2.00 exactly, find the standard deviation in y. Answer: (a) 25.8 (b) 1.08 X 10~ 3 The specific gravity of zinc sulfate solutions were found to be: sp. gr. % ZnS0 4 1.020 2 1.040 4 1.060 6 1.087 8 1.110 10 1.129 12 1.155 14 1.185 16 (a) Use y = sp.gr. —1 and plot the data in order to judge whether it is satisfactory to consider that the specific gravity is a linear function of the composition. (b) Finding this unsatisfactory, evaluate a and 6 in y = ax + bx 2 where x is the percent of ZnS04. (c) Evaluate a e and consider whether any of the points should be rejected. (d) Evaluate a a , <r&, <r y (l), <r y (9). (e) Plot the curve derived in part (b) on the graph made in part (a). Show the spread at x = 1 and at x = 9 with error bars of total length 2<r y as in Fig. 11-2. Answer: a = (9.63 ± 0.33) X 10~ 3 , b = (1.16 db 0.25) X 10" 4 , a y {\) = 3.1 X 10- 4 , 0-^(9) = 1.25 X lO" 4 . No points can be rejected. 9. Verify Table 11-3 from the data of Table 11-2. Make the initial calcula- tions of A and B to four decimal places. 10. In a particular experiment, (y, x)-data is expected to obey y = A + Bx, and the best values of A and B are desired. It is assumed that x is known more exactly than y. Measurements of y at some fixed value of x showed that a standard deviation of ± 0.15 could be expected in a single observa- tion. The following data are then taken: x y 0.40 2.50 1.00 4.95 1.50 6.50 2.20 7.00 2.60 8.75 3.00 10.00 Find the best values of A and B and their standard deviations. Answer: Solution according to the experimenter's assumption shows that a serious mistake was made; a e is about four times 0.15. Recalculation with 140 PROPAGATION OF ERRORS y as the exactly known variable gives A = 1.71 =b 0.69, B = 2.75 =b 0.27. Note that an initial plot of the data, with error bars shown on y, would have revealed the mistake immediately. 11. Measurements of the vapor pressure of uranium dioxide were made by heating in a good vacuum a sample contained in a cylindrical tungsten crucible having a small circular orifice of radius r in the center of its lid. Part of the vapor streaming out the orifice was condensed on a circular target of radius R at a distance d vertically above the orifice and concentric with it. If W is the number of grams condensed on the target in t seconds when the temperature of the sample is T degrees Kelvin, the pressure in atmospheres is given by _ 3 wVt p = 1.3726 X 10 3J ^r £ > (xt where G, called the geometry factor, is G = 2 D 2 irr R <*2+ #2 The construction is such that d remains constant as the temperature is raised. It can be assumed that t and T are known exactly. Prior study shows that one can take 10 Mg as the standard deviation in an observation of W. The following measurements pertaining to the geometry are made at room temperature: d = 11.286 ± 0.008 cm, R = 0.9507 ± 0.0004 cm, r = 0.0716 ± 0.009 cm. The following data are then taken : T (°K) t (sec) W Oxg) 2617 180 252 2688 120 314 2720 120 595 2778 60 544 2809 30 370 The object of the experiment is to determine the dependence of p on T; the principal terms in such a relation are given by log p = A + - i but it is frequently necessary to include a term in log T or in T~ 2 . It is PROBLEMS 141 desired to exert proper control of the experiment by comparing the internal and external consistencies of the results: (Note. This problem has been adapted from R. J. Ackermann, The High Temperature, High Vacuum Vaporization and Thermodynamic Properties of Uranium Dioxide, Argonne National Laboratory Report ANL-5482. The final least-squares results are not asked for, though that could be done; that calculation is adequately illustrated in other problems. The purpose here is to illustrate the steps which must precede, in a real laboratory situation, the solution of normal equations.) (a) What possible systematic source of error of the sort referred to in Section 5-1 as a theoretical error must be examined to see if it need be considered? (b) Examine the separate sources of error in G. Need they all be included? (c) Can any parts of the total problem be done by careful slide-rule work, and if so, what parts? (d) Should correction be made for the possible systematic error referred to in (a) ? If so, how carefully need it be done? (e) Examine the separate sources of error in p. Need they all be included? (f) Compute the values of log p and plot them against lOft/T. (g) Compute the individual errors in log p and show them as error bars on the graph made in (f). [See part (e) of Problem 11-8.] (h) Does a third term in the equation for log p seem called for? (i) Does there appear to be reasonable consistency between the size of the error bars and the scatter in log pi Answer: (a) Increase in area of the orifice at the high temperatures, (b) No, only the error in r need be included, (c) Everything can be done by careful slide-rule work except the determinations of T~ l and the solution of the normal equations. The numerical coefficient in the equation for p is then converted to 1.373 X 10~~ 3 . (d) Since the precision error in r is not large compared with the change in r by expansion, correction must be made for the latter, but it need only be made by using the average temperature of the experiment. (The authors used 4.6 X 10 -6 as the thermal expansion coefficient of tungsten, 2700 °K as the average temperature of the experi- ment, and 300 °K as room temperature.) (e) The fractional errors in W and G are of comparable size, and both must be included, (f) and (g) Typically, log p(atm) = 4.07 ± 0.020 at 10 4 /T = 3.821. (h) Yes. (i) Yes. 12. In the Behr free fall experiment where timing is done with a tuning fork (see Section 10.5) the locations of the peaks of the trace can be chosen much more precisely when the fork is dropping slowly than when it is dropping rapidly near the end of its fall, where the amplitude of vibration also has become small. In order to estimate the weights of the various observations, consider the equation of the trace to be V = V0 e~ kt sin cat, with * = V2x/g. 142 PROPAGATION OF ERRORS Here as is given by the frequency of the fork, k is a measure of the rate of decrease of amplitude of the fork, x is the distance it has fallen, and g is the acceleration due to gravity. By supposing that it is necessary to be able to detect some fixed change in the deflection of the fork from a peak deflection, show that the error in x at each peak measured is proportional to y/x. With a fork of frequency 150 vib/sec and observations of x taken every 8 complete vibrations starting from an arbitrarily chosen zero, the following observations are made: Observation number x (cm) 1.4 1 5.4 2 12.8 3 21.5 4 35.0 5 49.2 (a) Using an observation equation of the form x = xq + vot + \gt 2 , find the properly weighted value of g, and its standard deviation indicated by these observations. (b) Does the size of the standard deviation reflect the precision or the accuracy of the experiment? (c) Considering the value of g obtained, the size of its standard deviation, and that the known value of g is ~981 cm/sec 2 , would it be sensible to include a frictional drag in the statement of the observation equation in order to get a better value for g, or would it be more sensible to use some other experimental arrangement? Answer: (a) g = 962 db 48 cm/sec 2 (b) precision (c) Clearly frictional drag represents a systematic error. The inclusion of a term for it in the observation equation would improve the accuracy, but the precision is too poor to warrant doing this. An improved experimental arrangement is called for. (Note. Actually, the precision of this experiment is much better than the made-up data here would indicate. The problem was put together in this way to combine an illustration of an approach to the problem of deter- mining weights of observations and of the sort of considerations made in part (c).) CHAPTER 12 INTRODUCTION TO STATISTICAL ANALYSIS Reference was made more than once in the previous chapters to an im- portant difference between the situations in the broad field of error analysis in which the physical scientist finds himself and those in which the biological, medical, or social scientists generally find themselves. We emphasized the difference in the relative sizes of the standard deviations in the variety of measurements made by these two groups. The com- parative sharpness of his error distribution curves has enabled the physical scientist to make tremendous progress with small samples and unsophisti- cated methods of error analysis. This, however, is not the only difference. Of equally great importance is the fact that investigations in the physical sciences usually involve a known relationship or a well-defined proposed relationship between groups of experimental variables and fixed parameters. The object of the in- vestigation is usually to determine the values of the parameters. We can often safely assume that all but one of the experimental variables are known with practically absolute certainty. That is, we can assume that all those uncontrolled factors which produce random departures from exact agreement with the relationship affect only the one variable. An example is the vapor pressure of graphite as a function of temperature which we discussed in Sections 11-6 and 11-7. The error distribution is one dimensional. If, on the other hand, one proposed to investigate the relationship, which is only vaguely definable, between the height and weight of 25-year- old men, for example, it is clearly impossible to make any such assumption. While one might expect a tendency toward an association of large weight and large height, and vice versa, there is no basis on which one could assign the status of independent variable to one variable, the status of dependent variable to the other, and then make an error analysis only of the latter. The uncontrolled factors such as frame structure, diet, amount and nature of 25 years of exercise, degree of normality of glandular func- 143 144 INTRODUCTION TO STATISTICAL ANALYSIS tion, and so on, affect both the height and weight in ways that could be studied separately. But in the restricted context proposed above, they are completely unknown. Thus one would have to treat both the height and weight as random variables. The "error" distribution would be two dimensional. Faced with problems of this sort, and with large standard deviations, the biological and social scientists have been forced to do much more than merely use large samples. They have to help themselves also by the development of much more erudite and, of course, complex methods of analysis than we have discussed so far in this book. A natural by-product of this development is the growing realization by the physical scientist of the usefulness of such methods. He has begun to wonder, to refer to a previous example, whether there is any chance that he has fooled himself by not considering the temperature as a random variable. The intent of this chapter then is to provide an elementary introduction to some of these more advanced methods. Most of our discussion will be concerned with more detailed investiga- tions of the properties of one-dimensional distributions. Our examination of the expectation value of the range in Section 10-11 and of the methods of internal vs. external consistency in Sections 11-5 and 11-6 were quali- tative or approximate approaches to the types of analysis which we now wish to make more precise. Briefly, we wish to put on a sounder numerical basis the problem of deciding whether two or more alternative results of experimentation or observation are definitely different, or whether the differences can be explained in terms of random or chance errors of observation. It is, of course, impossible to give an absolute answer to such a question. It must be realized that statistical analysis is not like plane geometry or counting. In a discipline of the latter kind one knows, for instance, that 501 is different from 502. In statistics, on the other hand, the questions are, "When we take into consideration all we know about the sources of these numbers, what is the probability that the differences are due merely to chance?" and, "When we consider the consequences, i.e., count the cost, of an erroneous decision, at what level of probability are we sufficiently convinced of the validity of a result so that we will base our future actions on it?" Before proceeding with the principal subject matter of the chapter, it will be convenient to discuss an extension of an earlier topic. In Sections 6-3 through 6-6 we considered the performance of an experiment such as the tossing of n coins or the rolling of n dice simultaneously. We arrived at the binomial distribution. We will now consider some other aspects of this problem to prepare ourselves for some of the procedures we shall develop later in the chapter. 12-1] EXTENSION TO N TRIALS 145 12-1 EXTENSION TO N TRIALS Suppose that we designate as a success some particular result of an experi- ment with a single simple element, such as a die. This result could be the appearance of any particular face, or of either of two particular faces. Let the probability of this success be called p. The probability of failure, 1 — p, will be called q. We found that if we made a simultaneous trial with n such elements, the probability of achieving r successes where r is some number between and n is given by 6>(n, r) = C(n, r)p r q n - r , (12-1) in which C(n, r) are the coefficients in the binomial expansion of (p + q) n . We also found the expectation value and standard deviation to be given by /x = np and a = \/npq respectively. The appearance of the multiplicative factor C(n, r) means that it does not matter which element or elements produce success in a toss. Thus the problem remains unchanged if, instead of trying n elements simul- taneously, we try a single element n times, except that we will now say that it does not matter which of the n successive trials result in successes. After the completion of n trials, we will find a probability distribution for the number r of successes, and this probability distribution is given by Eq. (12-1). We can use the same line of argument for N trials each of which is the simultaneous trial of n elements. Again we will define success to suit our purposes of the moment; there is no absolute definition of success in this field. Suppose that success in one of the N trials is the appearance of some fixed number, say r, of a particular result in a single trial of the n elements. It might be the appearance of two fives in a single toss of 10 dice. Keeping in mind that in this experiment a single elementary trial involves n elements, we see that the probability of success in a single trial is the (P(n, r) given by Eq. (12-1). It does not matter which of the N trials result in success. Consequently, the probability that R of the N trials will result in success is given by (P(N, R) = C(N, R)[(P(n, r)] R [l - <?{n, r)] N ~ R . (12-2) Furthermore, by the same arguments as used previously, we find that the expectation value is M = N(P(n, r) (12-3) and the standard deviation is a = VN(?(n, r)[l — (P(n, r)]. (12-4) 146 INTRODUCTION TO STATISTICAL ANALYSIS [12-2 Equation (12-3) agrees with our intuitive notion that the expected number of successes is just the number of trials multiplied by the probability of success in a single trial. The arguments used in Section 6-6 also apply here, however. One could hope to demonstrate the result with certainty only by the impossible procedure of approaching an infinite number of trials. Such an imaginary procedure, however, would merely demonstrate the validity of the primary distribution (P(n, r). Our choice of using the binomial distribution as the point of departure to arrive at conclusions for N trials of an experiment was merely a matter of convenience. The argument was clearly independent of that choice; it would have proceeded in exactly the same way if we had used some other probability function than (P(n, r). The latter could be replaced by some term in a Poisson distribution, or even by the integral over some particular interval of a continuous distribution, such as the normal distri- bution, without altering the result. In other words, when one performs N trials of an experiment in which the probability of success in a single trial is p and that of failure is consequently 1 — p, he is performing a binomial experiment, regardless of the f orm of p. The expectation value is Np, and the standard deviation is \/Np(l — p). Let us emphasize these remarks particularly for the case where p is drawn from a Poisson distribution. The distribution of the results of N trials of an experiment governed by a Poisson distribution is not itself a Poisson distribution. The value of N will usually be large but finite, and that of p small but also finite. As an example, suppose that for the Poisson distribution of Table 6-5 we made 10 4 observations to determine the expected number of one-sec intervals in which six counts would appear. We would compare the result with an expectation value of 10 4 X 0.1575, or 1575 such intervals, having a standard deviation of VlO 4 X 0.1575 X 0.8425, or 36.4 intervals. We mentioned earlier that the outcome of an imagined infinite number of trials would demonstrate the form of the distribution which describes the probability of each individual result. In a subsequent section we shall develop a procedure which will enable one to use the results of Eqs. (12-3) and (12-4) with a finite number of trials in order to assess numerically the probability of his observational results' being indeed governed by the distribution that he has proposed for them. 12-2 RADIOACTIVE COUNTING Our extension to N trials is specially suited to radioactive counting because of the nature of the probability of success involved in a single trial of one element in such a problem. For this reason, we shall examine counting experiments with a little more care at this point before going further. 12-2] RADIOACTIVE COUNTING 147 In Section 6-6 we considered as an example the decay of a £mg of 92 U 238 . It was pointed out that the probability of decay of a single atom, i.e., the probability of success in a single trial of one element, was about 5 X 10~ 18 per sec. Thus, in that section, a single trial consisted in the observation for one sec. It is a quantum mechanical result that if an atom is known to be 92 U 238 , i.e., undecayed, at a particular time, then the probability that it will decay subsequently increases linearly with the time. We have the alternatives then of regarding such an experiment as N trials of one-sec duration each, for each of which p = 5 X 10~ 18 , or as a single trial of iV-sec duration for which p = N X 5 X 10 ~ 18 , or as any- thing between these two extremes. As is so often the case, the choice depends on the purposes of the experiment. If the duration of a trial is short, obviously there will be considerable fluctuation in the number of counts per trial, i.e., per time interval. We would use a large number of relatively short intervals if we wanted to test whether the decay rate follows the Poisson distribution. On the other hand, when the object is the more common one of determining the probability of decay per sec, the best and simplest procedure is to use a single long interval. We will continue using the numerical values of the example in Section 6-6, writing results as though the expected values of /x and <r had been observed experimentally. Suppose that a single trial consists in the observation of the 1.3 X 10 18 atoms for 10 4 sec. Then N = 1 and p = 5 X 10~ 14 . The Poisson conditions are still satisfied; the expectation value, however, is now 6.5 X 10 4 with a standard deviation of \/6.5 X 10 4 , or 255 counts per interval of 10 4 sec. When the intervals are of one-sec duration, we see from Section 6-6 that the observation is 6.5 ± 2.55 counts per sec* The present result is 6.5 X 10 4 ± 255 counts per 10 4 sec or 6.5 ± 0.0255 counts per sec. Furthermore, with the expectation value as large as 6.5 X 10 4 — much smaller values would suffice for this — the situation is as described in Section 8-3. That is, if we took observations for several, say M, 10 4 -sec intervals, we would expect them to follow a normal distribution with a mean of 6.5 X 10 4 and a standard deviation of 255. The difference \ between the mean and the most probable value, which we found in Section 8-3, is of no significance. It follows from Eq. (10-15), derived for a normal distribution, that the average of M observations, each for a 10 4 -sec interval, would have a standard deviation of 255/a/M. We see then that when the counts are given on a per sec basis and all the readings * Here it is convenient to write the result in this way, but the reader should recall the discussions in Section 9-4 relative to the meaning and usefulness of the standard deviation for asymmetric and noncontinuous distributions. 148 INTRODUCTION TO STATISTICAL ANALYSIS [12-3 are used, it is immaterial whether we look at these M observations as described, or as a single observation for an interval of 10 4 M sec. This conclusion is most easily described by a table (see Table 12-1). Table 12-1 Interval (sec) No. of obser- vations fj. (sec) 1 (sec) -1 CO (sec) -1 count/sec 10 4 10 4 ilf M 1 6.5 X 10 4 6.5 X M X 10 4 255 255VU 255/VM 255VM 6.5 =fc 0.0255/VM 6.5 ± 0.0255/VM The extension of this discussion to other decay rates, masses of material, lengths of interval, and numbers of observations is obvious. It must be remembered, however, that the whole discussion hinged on the assumption that the value of n, the number of basic elements, is unchanged during the progress of this experiment. No matter how the total number of counts involved is partitioned, this number must be small compared with the total number of atoms present if the preceding discussion is to be applicable. 12-3 MULTINOMIAL DISTRIBUTION When we considered the rolling of n dice in Section 6-6, our attention was centered on the probable number of dice showing a particular face without regard to what faces the other dice showed. That is, we found that in a rolling of n dice, the probability of obtaining r aces, for instance, is C{n,rm\i) n - r , regardless of what the other n — r dice show so long as they are not aces. There are times, however, when one is concerned with what other faces, or, depending on the problem, the analogy to "other faces," turn up along with these r aces. In the case of a good die, the probability that any one face appear as the result of a toss is the same, |. In the following discussion, however, while it is helpful to keep the dice in mind as a framework on which to hang the argument, we shall distinguish the probabilities of occurrence of different elementary results from one another so that our conclusions will have more general applicability than just to the throwing of good dice. Suppose there is an elementary event, such as the rolling of a single die, which can produce any one of k results, and suppose also that the 12-3] MULTINOMIAL DISTRIBUTION 149 probability of the ith result is p^ Then k Z p. = i- »=i v In the case of a die, for instance, all the values of pi are £, and there are six of them. Now, suppose we cause the successive occurrence of n of these events. * Let rii be the number of results of the tth type, which has elementary probability pi. Then k ^2 ni = w, and our object is to determine the probability of this particular distribution of the values of n». It is evident that the probability that the first event will produce the first type of result is p\, that the second event will produce the first type of result is p 1} and so on up to the njth event, that the (n x + l)th event produce the second type of result is p 2 , and so on through the nth event. The joint probability of these n results is then k i=l In fact, this is also the joint probability of the n results, distributed as to type according to the specified values of n,, for any order of appearance. Our interest, however, is not in any particular order of occurrence; we wish to know the probability of getting the specified distribution of the values of n* regardless of the order in which they occur. Thus we need here, as in the binomial distribution, a factor which corresponds to the C(n, r) of the latter. We can make use of the knowledge gained in the discussion of C{n, r) in Section 6-5 to arrive at the present factor. The starting point is Eq. (6-3), which gives the number of combinations of n things taken rata time. It is to be remembered that this expression is based on the fact that the r things can be any r of the group of n things, and that any particular r things can be picked in any order. Then, if we are only interested in the number n lf we will find the necessary multiplicative factor to be n! ni'.(n — ni)l * Following the discussions of Section 12-1, the reader will understand that n successive trials with one element, followed by appropriate sorting of the results into groups, is equivalent to the simultaneous use of n elements. 150 INTRODUCTION TO STATISTICAL ANALYSIS [12-4 For any one choice of n x things, we can choose n 2 in in — ni)! n 2 \{n — n x — n 2 )\ ways since, after having chosen n x , there are only (n — n{) things from which to pick n 2 . We can repeat this argument for all such factors, the last of which is [n- £{-}»<]! n k \0\ where, as was pointed out in Section 6-5, 0! = 1. Since each of these factors is the number of choices of a particular type for any single choice of the types preceeding it, the total number of ways of picking n x things of a first type, n 2 things of a second type, and so on up to rifc things of a kth type, out of a total of n things is the product of all these. Thus the desired multiplicative factor is ni! n 2 \ • • • njfc! and the multinomial distribution is ^^"..ij'-^i"^-- ••">'• (I2 " 5) 12-4 THE RANGE In Section 10-11 we gave an approximate derivation for the expectation value of the range, the range being the difference between the largest and the smallest readings in a set. In that derivation it was supposed that the individual readings were normally distributed, but it was pointed out that the range itself has a distribution which is not normal. In the present section we shall begin the examination of several quantities which fit this same general description ; they are all descriptive of some aspect of a finite group of readings drawn from a normal distribution, but the quan- tities themselves are variables which are not normally distributed. This kind of interplay between different forms of distributions appeared in Section 12-1; it was shown that the probability of R successes or, equivalently, of R particular results out of N trials is binomially dis- tributed regardless of the form of the distribution which determines the probability of occurrence of that particular result in a single trial. A proper discussion of the distribution of the range must begin with an extension of this idea to the multinomial distribution. While the discussion will proceed on the assumption that the probabilities of occurrence of individual events in a single trial are given by the normal distribution, 12-4] THE RANGE 151 the reader should remember that this assumption is not essential to the argument; we use it merely because this is the most used and tabulated case. One part of the argument, however, will be peculiar to using continuous distributions as the source of the probabilities of individual events. With a continuous distribution, as illustrated in Section 9-5, we only get a finite probability by integrating the distribution function over some interval of the variable which is the argument of the function. Such an interval is arbitrary and changeable for different applications. Hence we will find a distribution function for the range, which we will use in the same way as we used the normal function. To find the probability that the range lies within a certain interval, we must integrate the distribution function over that interval. The procedure which we will use to arrive at the function is a limiting process similar to that used in the definition of the derivative. The point of departure can be made crystal clear by a statement which is almost foolishly obvious. Among the finite number of readings in hand there must be a greatest and a smallest, and all the others must lie between these two. Thus we must determine the probability that all our observa- tions lie within specified bounds. To do so, we first divide the whole interval in which the readings can fall into five sections. The first section extends 'from — oo to x — (Ax)/2, where x is the smallest reading; the second extends from x — (Az)/2tox + (Az)/2; the third from x + (Az)/2 to y — (Ay)/2, where y is the greatest reading; the fourth from y — (Ay)/2 to y + (Ay)/2; and the fifth extends from y + (Ay)/ 2 to +00. For a normal distribution with expectation value n and standard deviation a we can write the distribution function as f(x) = — — exp 0-V27T ^2^"1 (12 " 6) if we use Eq. (9-4) in Eq. (8-12) and if we agree to let x now represent a reading; the real error then would be (x — /x). The probabilities that a reading will occur in the intervals described above are then /.x+(Ax)/2 r- 2 -j V2 = —= / exp - U ~ 2 M) Ug, etc. We must next write down the probability that, of the n readings, none will fall in the first or fifth intervals, one each in the second and fourth intervals, and n — 2 in the third interval, where the order in which the 152 INTRODUCTION TO STATISTICAL ANALYSIS [12-4 readings of various sizes occur is immaterial. This is clearly a multinomial distribution with a probability, from Eq. (12-5), AP = OIlI(n-2)lilO! P?P ' pr8pip8 = n(n ~ 1)p2P4pr ' ; (12 ~ 7) the probability is written as AP since it depends on the sizes of Ax and Ay. The limiting process mentioned earlier will involve an evaluation at small values of Ax and Ay. Consequently, we keep only those terms in the evaluation of pi, P2, etc., which contain Ax and Ay to their first powers as factors. Expressions like those for p 2 and p 4 were evaluated under these conditions in Section 8-1. The results for these two are Ax \ (x — m) 2 1 no c . P2 = ^ exp r~*H < 12 - 8 > (12-9) /2tt and (a; -M) 2 ' 2<r 2 (v -»?' P* = ~~7^= eXP ~~ 2^2 <rV2ir L & By using the definition of / in Eq. (12-6), we examine p 3 : ry -Ay/2 Pz = / /(€) dt ry /-x+Ax/2 ry = / M dt - / /(*) d$ - \ M) dt Jx Jx Jy—Ayl2 = />«-/(x + f)f-/(,-f)f- The first term will retain its value even when Ax and Ay become vanishingly small. Mindful of our intent to keep only the first powers of Ax and Ay and inspecting Eqs. (12-7) through (12-9), we see that for p 3 we need only keep the integral from x to y. The expression for AP then will involve the product AxAy. If we divide this expression by AxAy and make the latter vanishingly small, we obtain the following distribution function as the result: f(x v) - "("-D expF- (* ~ m) 2 + (y - m) 2 ' r rv X ['A-^hV- (i2 - io) If we wished to know the probability of finding a smallest reading between x and x + Ax along with a greatest reading between y and 12-4] THE RANGE 153 y + Ay, we would evaluate -x+Ax f y+Ay II J x J y f(x,y)dxdy. (12-11) This is not what we are after, however. We wish to have a function with which to find the probability of r lying between R and R -f- AR, where r = y — x regardless of the individual values of x and y. For this purpose we must make a change of variable, y = x + r, (12-12) and then integrate over all possible values of a; at a constant value of r. The reader will realize that r is bound to be nonnegative. Before rewriting the function with the new variable we note that there are other useful changes of variables ; the reader should verify the results of all of these. After using Eq. (12-12), let x — n £ — u r u = > v = - — > w = - • a a a With these substitutions, we obtain the distribution f(x, y) dxdy —> <r 2 f(u, w) dudw, so that the distribution function becomes tt •, <n - 1) r f u 2 + (u + w) 2 V f U+W -v*/2 , V -2 . f{w > n) = 12^7^ L GXP I 2 \\L e dv ) du - (12-13) Here n in f(w, n) is arbitrarily inserted for convenience; it is the total number of readings. Inspection of Eq. (12-13) shows that, remarkably enough, f(w, n) is independent of fj. and a; it depends only on w and n. While Eq. (12-13) looks very complex, and indeed its numerical evalua- tion can be complicated for n > 2, we have seen that it is reasonably simple in concept. Furthermore, its basic meaning and its use are no different from those of the normal-distribution function given by Eq. (12-6). In particular, the expectation value for the ratio of the range to the standard deviation, deduced approximately in Section 10-11, is given exactly by -/ ./o Mw(n) = I wf(w, n) dw. 154 INTRODUCTION TO STATISTICAL ANALYSIS [12-4 The standard deviation of the ratio is found from (Twin) = / [w — p w (n)] f(w, n) dw, Jo and the probability that the ratio lies within the interval from zero to W is • w (P(W, n) = / f(w, n) dw. (12-14) Jo For purposes of illustration, we shall evaluate these expressions for the simplest form of f(w, n) , which is f(w, 2) . To do this, we must first find f(w, 2). Equation (12-13) gives f(w, 2) = - / exp u 2 + (u + w)'' 2 1 —w"/2 I —(u'+uw) du One way of evaluating the integral is shown in Appendix 5, where it is found to have the value y/r e w /4 . Thus f(w, 2) = -— e , V7T which is just twice the normal-distribution function with a = \/2. After noting this fact, however, we should recall that w > only. Thus f(w, 2) is properly normalized; that is, ) f(w, 2) dw = 1. o The expectation value for w when there are only two readings is then 1 / —w"/4 ju w (2) = — — / we dw. V7r J o Since w dw = %d(w 2 ), we can easily obtain n w (2) — 2/\/ir. The stand- ard deviation will be given by a£(2) = 4= f ™ 2 e~ w2 '*dw - 4= (4= f ™- w2 'Uw) \ZttJo v IT w -k J o / + ±(4=re-" 2 '*<toY 12-4] THE RANGE 155 Integrals of the form in the first term above are discussed in Appendix 6. In the second term, the factor in parentheses is y w {2), while the factor in parentheses in the last term is unity. The final result is a w {2) = x \2 - ^- Table A on p. 230 describes the normal distribution by giving the areas under the distribution function for uniform increments of the argument of the function. Various measures of the distributions of finite- sized groups of readings, which we shall discuss in this chapter, including the range, are generally tabulated in an inverse manner. That is, values of a variable integration limit are usually tabulated for particular, often uniformly incremented, areas under the distribution functions. These areas might be computed by integration from the tabulated values to infinity, or from zero to the tabulated values, or otherwise in ways most appropriate to the quantity being tabulated. In principle, the problem is exactly the same as with that of the calculation of probable error, found in Section 9-2. For the present case, Eq. (12-14) indicates that the integration is to be from zero to some value W, where W is either tabulated or plotted. As an example, suppose that we wish to know the value of W such that there is a 90% chance that the ratio of the difference of a pair of readings to the standard deviation of the distribution from which the readings are drawn lies between zero and W. Then the equation to be solved is 0.9 = -4= / e- w2 ' 4 dw. (12-15) W Jo It was mentioned earlier that this distribution function is of the normal form. In fact, if we set 10 = y/2t, we can rewrite Eq. (12-15) as rW/^2 0.45 = V2TT-I ( e~ ei2 dt and interpolate from Table A in the same way as we do in the rejection of readings. Thus w V2~ 1.645, so that w = 2.325. 156 INTRODUCTION TO STATISTICAL ANALYSIS [12-4 It will be well now to review the meaning of these results. Suppose that we draw a very large number of pairs of readings (n = 2 in our example) from a normally distributed universe with a standard deviation <r. We would expect the average of the differences between the two readings constituting each pair to be /?- 2<r V 7T We would also expect the separate values of w = r/tr to be distributed about the number 2/\Ztt with a standard deviation of \/2 — 4w. Finally, we would expect 90% of the values of r to be less than or equal to 2.33a\ Perhaps it should be reemphasized that the a in these expressions is that of the normal distribution from which the readings are drawn. On the other hand, if one takes only two readings with a difference R, he would estimate a to be RVt and the standard deviation of the average of the two to be <ro = ^- (12-16) Furthermore, if he knew a, or thought he did, from earlier observations, and if he took a pair of readings and found that their difference divided by a was 2.33 or greater, he might begin to wonder whether something might have gone wrong since there is only one chance in ten of this hap- pening. Most probably he would accept this result, but if the ratio were as large as 3.64, he would probably reject the readings or reexamine his value of <t, since this value of a has only one chance in one hundred of being exceeded. From Eq. (12-16) we find that Oo _ Vt R 2V2 This kind of information is to be found in Fig. 10-2, though the figure does not give values of a /R for n < 5 because the approximation used in its derivation gives poor results at low values of n. More complete information is given in Fig. 12-1, which includes results for the small values of n as well as bounds between which a /R can be expected to fall with the specified probabilities. The expectation value of <r Q /R given here should be compared with that in Fig. 10-2. The precipitous loss of preci- sion with few readings is clearly shown by the limit curves. (Figure 12-1 is plotted from Table V of Ref. 9.) 12-5] THE X-SQUARE DISTRIBUTION: TESTS OF FIT 157 1.2 1.1 1.0 0.9 0.8 0.7 (r /R 0.6 0.5 0.4 0.3 0.1 0.2 1 1 1 1 1 1 1 1 1 Expectawuii vaiue ui ctq/ii - 90 percent chance that <r /R will be between these limits - will 1 oe beta reen th ese lim its - i t - - \ t \ t - 1 \ - - \ \ \ \ - - \ \ \ \ \ \ \ - - \ \ \ \ \ - - \ \ \ > \. - ' ^^ ^.^ ^> / -- - 1 i i i 1 i i i 6 8 10 12 14 16 n FIGURE 12-1 12-5 THE X-SQUARE DISTRIBUTION: TESTS OF FIT In Section 12-1, we promised to develop a procedure by which we can assess numerically, from the results of Eqs. (12-3) and (12-4) and a finite number of trials, the probability that the observational results are indeed governed by the proposed distribution. Although, in one sense or another, any of the distributions discussed in Sections 12-4 through 12-8 will serve the same purpose, it was the X 2 (chi-square) distribution which we had in mind. We have repeatedly emphasized that it would take an infinite number of observations to demonstrate with absolute certainty that one's observa- tions follow a particular distribution. Obviously one cannot make an infinite number of observations. On the other hand, one can easily propose some distribution that the observations appear to be following. He can 158 INTRODUCTION TO STATISTICAL ANALYSIS [12-5 then calculate the expected results, and compare them with the actual results. If the proposed distribution is correctly chosen — and the choice becomes more restricted with increasing number of observations — he will find the results to agree closely, though not exactly, with each other. With the aid of the X 2 -distribution he can calculate the extent of chance disagreement that might be expected with a finite number of observations even though the proposed distribution is correct. The inference is that if the disagreement is greater than expected, then the proposed distribution must be wrong. It should be understood that "wrong" includes the case where the correct form might have been chosen, but with wrong parameters. Let us illustrate this last statement with a classic example of the applica- tion of the x 2 -distribution. Suppose that one makes several trials at rolling 10 biased dice, each of which has been loaded in the same way. The expected distribution is a binomial distribution of the kind illustrated in Fig. 6-2, but the numerical values will not be as given in that figure. The dice for the distributions of Fig. 6-2 were "good" in the sense that the a priori probability for the appearance of any given face in a single toss of one die was £. When the dice are loaded, this is no longer the case. In the following derivation, we will see that the number X 2 is basically the ratio of the sum of the squares of the errors to the square of the standard deviation. In our earlier discussions (see Sections 12-1, 12-2, and 12-4) we pointed out the distinction between the binomial or multinomial distributions resulting from N trials of an experiment and the distribution which describes the expected results of a single trial. Also, when we dis- cussed the range, we chose to use a normal distribution for the probabilities of single events; we shall continue to do so.* In this case, the "errors" in the definition of X 2 almost certainly will be residuals, since it is very unlikely that for those distributions assumed to be normal there is any other source of information about the expectation value than the arithmetic mean of the observations. The same situation characterizes the standard deviation, and we shall use these two facts as constraints, the effect of which we shall describe later. The X 2 -test is also applicable to other kinds of distributions. For a derivation not specifically tied to the normal distribution, the reader is referred to Fry [2]. We have already met several examples of such other distributions. In the case of a binomial distribution, when the problem is the tossing of coins or the rolling of dice, the errors in the definition of X 2 may be considered as the real errors to the extent that the dice or coins may be considered as good, and the value of the probability of success * The arguments presented in this and the following two sections are similar to those presented by Arley and Buch [8], pp. 93f. 12-5] THE X-SQUARE DISTRIBUTION: TESTS OF FIT 159 in a single trial of one element is given by a priori considerations. Then there are fewer constraints than in the case of a normal distribution. However, the number of constraints will increase if we take the view that the elemental probabilities are biased by the manner of tossing, by loaded dice, filed coins, etc. To take this view means that some number or numbers previously assumed to be known must now be calculated from the obser- vations before we can proceed with the test. Suppose that observations are drawn from a normal distribution which has a mean n and a standard deviation 0". Only some finite number can be drawn; let this number be n and let the individual observations be desig- nated X{. The probability of drawing the ith. value is, from Eq. (12-6), f(Xi) dx< = ^fe exp [" (Xi w } ]**• The total probability of getting the n independent observations, being the product of the probabilities of getting the individual ones, is e&» = , exp (oV2^) n 2(T 2 -I II dx it (12-17) where the sum £ and product II are over all the values of i from 1 through n. The next step is to define n new variables to replace the z/s. So long as there are n new independent ones, defined in terms of the n xjs, the problem will be just as describable in terms of them as in terms of the Xi's. First, let the (n — l)th of the new variables w n -i> be the mean of the observations. That is W n _! = X = lY ( X i . (12-18) 1=1 As before, Vi will represent the ith. residual; Vi = Xi — x. (12-19) We define the nth new variable, u n , as «» = q = ( E »? ) • ( 12 "20) (n \l/2 Finally, the first (n — 2) of the new variables will be defined as m = Vi/q, i= 1, . . . , n - 2. (12-21) 160 INTRODUCTION TO STATISTICAL ANALYSIS [12-5 The sum of the squares of the real errors, which appears in Eq. (12-17), becomes, upon expansion, £>f — 2/£>; + nix 2 . From Eq. (12-18) we obtain 7 . JO'S. — TiJOj and from Eqs. (12-19) and (12-20), q 2 = ^x 2 ~ 2xE,Xi + nx 2 — J^x 2 — nx 2 . With this as a source for XX 2 , we find that the sum of the squares of the real errors becomes Zfa - fx) 2 = n(x - m) 2 + q 2 - (12-22) It is now necessary to convert II?=i dxi to some expression in terms of the new variables. The first step is to solve for the old variables in terms of the new ones. This is easy for the first n — 2 values of i. Recognizing that x is w n -i and q is u n , we can use Eqs. (12-19) and (12-21) to find Xi = w„_i + u n Ui, i = 1, . . . , n — 2. (12-23) From Eq. (12-18) and by writing the last two items of the sum separately, we find that n— 2 nu n _x = X n + Zn-1 + ^2 x i} (12-24) and from Eq. (12-23), n— 2 n— 2 ^ Xi = (n — 2)w n _i + u n ^ m. (12-25) i=l i=l Henceforth we shall write Ya=\ u% simply as J^u. Then, combining Eqs. (12-24) and (12-25), we obtain X n + X n -i = 2u n - X — UnJ^U. Subsequently we will find it more convenient to write this as (X n — Un-l) + (Xn-1 ~ M«-l) = ~ «n5>. (12~26) By a similar operation with Eq. (12-20) we find that (X n - W n _ X ) 2 + (X„_i - W n -l) 2 = U 2 n (l - I> 2 ) , (12-27) 12-5] THE X-SQUARE DISTRIBUTION: TESTS OF FIT 161 where, as before, I> 2 means L?=i 2 w 2 . If we transpose (x n — w n -i) to the right-hand side of Eq. (12-26) and then square the equation, we will obtain an expression for (x n — w n -i) 2 for substitution into Eq. (12-27). Solving the resulting quadratic equation results in x n = Wn _x - XO> + V2 [1 - & 2 - UZu) 2 } 1 ' 2 }, (12-28) and a back substitution leads to xn-x = t<»-i - iun{Zu - y/2 [1 - I> 2 - UZu) 2 ] 1 ' 2 } . (12-29) During the solution of the quadratic equation there will arise the question of the sign of the radical. The choice is immaterial since it does not matter which observation is called x n and which x n -\. At this point the reader is referred to Appendix 7 on multiple integra- tion. According to the procedures described there, it is necessary to set up a determinant of partial derivatives of the old variables with respect to the new ones in order to make the transition from one integrand and one set of differentials to an equivalent integrand, but in terms of new variables, and new differentials. Thus, the necessary relation is now seen to be the following: n n ]J (dxi) -> J(x lt . . . , x n ; u u . . . , u n ) U (dui), (12-30) t=i t=i where «/ is the determinant just mentioned. As examples of the differentia- tion, we can easily see from Eq. (12-23) that for x 3 , say, all derivatives with respect to the u's are zero except dx 3 — Wji dxz du 3 du n _i By extension then, J is the determinant 1 dx 3 h d^n = U3 ' U n 0-- •1 u x u n 1 u 2 u n 1 w 3 dXn-l dx n _i 1 dXn-l du\ du 2 du n dx n dx n 1 dx n dui du 2 du n 162 where INTRODUCTION TO STATISTICAL ANALYSIS [12-5 dXn-l dx n _i du n dXn dUi 1 + 2(2w,- + I» V2[l - Zu 2 - i(Ew) 2 ] \ jEu - V2[i - E^ 2 - i(E^) 2 ]} > 2(2w,- + Em) 1 2 U 1 - V2[i - E^ 2 - i(E^) 2 ] for i = 1, . . . , n — 2. It is a property of determinants that a factor that is common to every entry in a single column (or row) can be divided out as a multiplier of the determinant [12]. Examination of the determinant / shows that u n is a factor common to the elements of every one of the n columns except the last two. Furthermore, this is the only way in which u n enters the determinant. Thus n n—2 JJ dxi = q n ~ 2 C(ui, . . . , u n _ 2 ) II dui dxdq, i=i i=i where C represents what is left of the determinant after the division by u n (=q). Note that m w _i (=x) does not appear in / at all. Thus, at this point, the original probability as given in Eq. (12-17) has become d$ = I — — J exp — - -1— exp I — jrf^ ) q dxdqC {I du { W2ir/ L 2(o-/Vn) 2 J V 2<r V £=i (12-31) The first thing we observe in Eq. (12-31) is that if we multiply by y/n/y/n, we can separate a set of factors 1 (a/y/n)y/2ir- exp (x ~ »? 2(<r/Vn)' 4 dx from the others, where the latter become independent of x. Thus we see that the error in x, the mean of the observations, is a normally distributed variable with a standard deviation of a/y/n, as given previously in Eq. (10-15). From the remaining part of d<& a second set of factors <$r '■*"*<$ (12-32) 12-5] THE X-SQUARE DISTRIBUTION: TESTS OF FIT 163 can be divided out with no further reference to q in the remaining factor. The expression (12-32), involving the factor q n ~ 2 , is not a normal distri- bution. Furthermore, considering that various numerical factors have been left behind in C, the distribution of q/<r given by the expression (12-32) must also be presumed to be nonnormalized. That is, contrary to the result found in our discussion of the range, where the integral of f(w, 2) over all possible values of iv was found to be unity, we must here make the integral over all values of q unity by the calculation of a normalizing factor. The value of eM> can be held unchanged by including the reciprocal of the normalizing factor in the part left behind. In the previous case the only possible values for the range variable to were found to be positive. Here also q 2 is necessarily positive since it is a sum of squares. For q itself then we find it convenient to use only the positive square root. Further, since q and a enter the expression (12-32) in a symmetric way, the normalizing factor must be a function only of n, which is the number of readings. Before evaluating the normalization factor then, we shall take advantage of this symmetry by making certain changes of variable similar to some of the changes made in the discussion of the range. It is the ratio q/<r that enters, just as it was the ratio of r to a that entered into the distribu- tion of the range. Thus the distribution of q/a is independent of both the expectation value and the standard deviation of the distribution that determines the frequency of appearance of individual observations. Rather than discuss the distribution of q/<r, however, we shall discuss the distribution of (q/(r) 2 ; this will allow us to avoid the necessity of taking square roots. Thus we shall use „ ,2 w 2 & as an integration variable and reserve the designation X 2 for observed values of this quantity; X 2 will be an integration limit. We recall that a number x was determined from the observations so that there is a constraint on the value of q. This means that q 2 is a sum of squares of numbers such that the sum of the numbers themselves is zero. At most, therefore, only n — 1 of the residuals can be called free variables. The number of "degrees of freedom" will be designated by /. Thus from the expression (12-32) we find that the distribution for which we must determine the normalization factor P(f) is <p(w 2 ,f) d(w 2 ) = P{j){w 2 f- 2)l2 e- w2 ' 2 d(w 2 ), (12-33) where the factor of \ in dw = l/2w d(w 2 ) is included in P(f). 164 INTRODUCTION TO STATISTICAL ANALYSIS [12-5 The normalization factor is determined from /•OO -00 / <p(w 2 ,f)d(w 2 ) = 1 = P(/) / ( W y f -» l2 e- w212 d{w 2 ). Jo Jo Here the reader is referred to Appendix 6 for a discussion of the evaluation of definite integrals of this form. We see from this Appendix that P(/) = [r(//2) X 2//2] ' (12 ~ 34) where r(//2) is read as "the gamma function of (J/2)." The reader is not required to know any more about this function than is described in Appendix 6 ; it is a convenient designation to use here because of the way in which the value of P(f) changes depending on whether / is an even or an odd integer. It was mentioned in Section 12-4 that various alternative but equiva- lent ranges of integration are used to describe the areas under distribution functions of the kind we are concerned with here. For the present case, Table B on p. 232 describes the distribution of X 2 by giving values of X 2 calculated from (P( * 2 ' f) = r(//2)X 2»» L ("T- 2 " 2 *-" 2 ' 2 d(w*) (12-35) for particular specified values of (P(x 2 , /). The meaning of Eq. (12-35) is the following: For a given value of /, there is a probability (P(x 2 , /) that, purely by chance, an observed value of X 2 will be greater than or equal to the tabulated value when the observed value is calculated from the correct distribution. The practical use of Eq. (12-35) via Table B, which is constructed from it, can be described best with the aid of examples to be given later. Before proceeding to these examples, however, it will be well to illustrate the construction of Table B. We will also need to do a little more mathematical work to derive a property called the additivity of X 2 . Suppose that we wish to determine from five observations the value of X 2 which has a 90% chance of being exceeded. In this case, / = 4, which is an even number, so that from Appendix 6 we find that r(2) = i. Thus we must evaluate X 2 in /»00 0.9 = 7 / xe-* 12 dx. This integral can be easily evaluated to yield the result X 2 = 1.064. 12-5] THE X-SQUARE DISTRIBUTION I TESTS OF FIT 165 Suppose, on the other hand, that we wanted the value of X 2 which has only a 10% chance of being exceeded. Then we would expect it to be considerably larger than 1.064. From 4 A 2 0.1 = 4 / xe~ xl2 dx we obtain x 2 = 7.779. Finally, suppose that there were six observations and we wanted the value of X 2 which has a 10% chance of being exceeded. Here / = 5, an odd integer. The evaluation of X 2 in this case is somewhat trickier and will be shown in greater detail. From Appendix 6 we find that r(f) = f x i x Vx, so that 2 c n _ n i _ /o./^-l / /'.„2n3/2 -» a /2 j/^2% J x (P(x 2 , 5) = 0.1 = (3V2X)- 1 (wY "6— " d(w<). Jx 2 Since it is convenient here to use w rather than w 2 as the integration variable, the equation can be transformed into 0.1 = -4= f w\- w2 ' 2 dw. 3V2ttJx We must make two successive integrations by parts. First, we let u = w 3 , dv = we~ w2/2 dw to obtain for the integral - W *e-«' ,2 \" + Z t" w 2 e-*'* dw. I X ^x Another substitution with u — w yields 0.1 = -?- (W* 2/2 + 3xe-* 2/2 + 3 f e- w2 ' 2 dw) ■ 3\/27T\ J * / Since we are merely verifying the entries in the table, we shall substitute the value X 2 = 9.236 from the table into this equation to see whether the right-hand side has the value 0.1. The first two terms are straightforward. The last term is evaluated as in the example for the range. That is, using X = V9.236 = 3.04 and obtaining from Table A '2tJo 3.04 e~ w ' rz dw - 0.4988, 1 ' -™ 2 /2 j an _ 166 INTRODUCTION TO STATISTICAL ANALYSIS [12-5 we find that J3.i e- w2 ' 2 dw = V2x(0.5 - 0.4988), 3.04 which verifies the value 0.1 for <P(9.236,5). Let us now proceed to discuss the additivity of X 2 . If one had two independent sets of observations drawn from the same distribution, he could calculate X 2 for each. The distribution of X 2 for each would be given by Eq. (12-35), which is derived from Eq. (12-33); into this equation would enter in turn the number of degrees of freedom for each set. We wish to show that the distribution of the sum of the separate X 2 is a X 2 - distribution for a number of degrees of freedom equal to the sum of the degrees of freedom in each set. If this can be done, the results can be extended without further proof to any number of sets by taking one set of a pair to be a previous combination. It should be noted that it is the combination of values of X 2 which is being discussed and not the combination of sets of readings. With two sets of readings, for example, the combination of X 2 and the degrees of freedom discussed above imply two constraints or restrictions corre- sponding to the calculation of a mean for each set. If the two sets were combined before the calculation of X 2 and the number of degrees of freedom, the result would imply that there was only one constraint, corre- sponding to the calculation of a single mean for the combined single set of readings. Let the two sets of observations be designated x\, x 2) . . . , x nv and Vi, V2, • • • , Vn 2 - Since every observation in each set is independent of the others, the joint probability for the existence of the two sets will involve the total sum of squares of the differences between the readings and the mean fx of the parent distribution [see Eq. (12-17)]. It is con- venient for the present purpose to break this sum up into two parts. The joint probability is ^fcW" exp| - +n 2 { ^ V 2^J —I 2^ £(*.-- /*) a + i:(y.--/0 J II dxij[ dyi, (12-36) where i = 1, . . . , n x for x* and i = 1, . . . , n 2 for y{. The various factors in this equation can be treated just as before until one arrives at a com- bined distribution corresponding to Eq. (12-33), which now has the form <p{wl, Wy, f Xt f y ) d(wl) d{u>l) = P^PC/,)^)^- 2 ^ 2 ^)^- 2 ^ 2 ^^ 1 ^^ 2 d{wl) d{w 2 y ). (12-37) We again go through the now familiar process of changing integration 12-5] THE X-SQUARE DISTRIBUTION: TESTS OF FIT 167 variables. This time, let w x = w cos 0, Wy = w sin 0, so that Wx = w 2 cos 2 0, w 2 = w 2 sin 2 0, and W 2 = W x + W 2 y Recalling that the integration variables were w 2 and w 2 rather than w x and w y , and maintaining w 2 as the new integration variable, we write d(w 2 ) d(w 2 ) = J{w x , w 2 ; w 2 , 0) d(w 2 ) dd corresponding to Eq. (12-30), and find with the aid of Appendix 7 that J(w 2 , w 2 ; w 2 , 0) = 2w 2 sin cos 0. Then, referring back to Eq. (12-37), we obtain w 2 {wlf*- 2)l2 (w 2 y y f *- 2)l2 = (w 2 ) (f - 2)l2 (cos d) f *- 2 (sin B) f y- 2 , where / = Jx ~T~ Jy- The distribution now has the form <p(w 2 ,e,f x ,fy,f)d(w 2 )de = 2P(f x )P(fy)(w 2 f- 2)l2 e- w2 ' 2 (sin fl/"" 1 (cos fl/*" 1 d(w 2 ) dd. Since we are only interested in the distribution of w 2 , regardless of whether it is made up of a large w 2 and small w 2 , or vice versa, we integrate over those values of 6 which cover all these contingencies, which is from to 7r/2. Thus we must evaluate r' 2 (cos d) f *~ l (sin d)'*- 1 dd. Jo It is shown in Appendix 8 that, regardless of the oddness or evenness of f x and j y , the value of this integral can be written as /. W2 N/.-1 ^- <*/.-! A* - 1 T(f x /2)T(f y /2) (cos ey*- 1 (sin eyy~ l dd = ± yjx/ T ( f /2) ' (12 " 38) When this result is combined with 2P(f x )P(j y ), the distribution of (w 2 ), that is, <p(w 2 ,f)d(w 2 ), takes the form of Eq. (12-33) with P(f) given by Eq. (12-34) except that now /is the sum of the degrees of freedom of the two separate sets we started with. It should be reemphasized that this result applies for sets which are independent of each other. 168 INTRODUCTION TO STATISTICAL ANALYSIS [12-5 Table 12-2 Experiment Reading Analysis 1 -4.5 x = —0.9 2 3 2.5 -1.5 2> 2 = 96.5 4 5 6 -5.5 -0.5 —1.5 Q>) 2 /10 = 8.1 .\5> 2 = 88.4 7 8 1.5 —3.5 X 2 = 7.87 9 4.5 0.5 < P(X 2 > 7.87) < 0.7 10 -0.5 with/ = 9 We are now ready to consider some examples of the application of X 2 testing. The first four examples will be taken from the dart-dropping experiment described in Chapter 7. The first three will be somewhat arti- ficial in that we will use arbitrary sets of data. The purpose is to illustrate the arithmetic and the way in which Table B is used. For these three illustrations we will assume that the parameters of the distribution, particularly the standard deviation, are known from a priori knowledge to be those in the last row of Table 10-5. In particular, the value of a used in the first three examples are not calculated from the data of those examples. The first example is shown in Table 12-2. Some discussion of the last column is necessary. First, the reader should show that X>v 2 can be computed directly from v- 2 v- 2 (Ewx) 2 T\wv = 2Lwx — ^-^ [see Problem 10-1]. Second, as in the derivation, X 2 is just the sum of the squares of the residuals divided by the square of the standard deviation for the distribution. Third, since we are assuming <r to be given, as in fact it is, so far as this set of data is concerned, the only restriction involved is in the calculation of x, that is, I> = 0. The number of degrees of freedom is one less than the number of readings. Fourth, when we refer to Table B for / = 9, we find that a X 2 as large as this could be expected to occur by chance more than half the time even when calculated on the basis of the actual distribution. It is interesting then to compute a from these observations. The result is 3.13 as compared with the "true" a of 3.352. Suppose now the data in Table 12-3 have been obtained. The analysis shows that only one time in 50 could one expect a x 2 as large as this to occur by chance when the correct distribution was used in its computation. 12-5] THE X-SQUARE DISTRIBUTION: TESTS OF FIT 169 Table 12-3 Experiment Reading Analysis 1 2 3 4 5 6 7 8 3 10 -9.5 6.5 0.5 0.5 —6.5 —4.5 2.5 -0.5 4.5 -2.5 x = —0.9 I> 2 = 228.5 (2>)710 = 8.1 .\2> 2 = 220.4 X 2 = 19.62 P(X 2 > 19.62) ~ 0.02 with/ = 9 This figure is in the range, 1 to 5 times in 100, where most statisticians would conclude that it is safe to say that the data do not fit the assumed distri- bution. The standard deviation computed from this set is 4.95. Next, let us consider the data in Table 12-4. Here the converse is true. It is very unlikely that data drawn from a distribution having <r = 3.352 would have as small a value of x 2 as is shown by the data of Table 12-4. The standard deviation calculated from these data is only 0.21. Finally, of course, the question arises as to how well the 498 points now in the dart-dropping data fit the normal distribution according to a X 2 test. The analysis shown in Table 12-5 demands the simultaneous applica- tion of many of the ideas which have previously been considered more or less separately. The first column of Table 12-5 gives the various intervals into which the target was divided. The second gives the expected number of drops in that interval according to a normal distribution with fx = —0.7229 and <j = 3.352. The third column shows the observed number of drops Table 12-4 Experiment Reading Analysis 1 2 3 4 5 6 7 8 9 10 -1.5 -1.5 +0.5 -0.5 -1.5 -0.5 —1.5 -0.5 -0.5 —1.5 x = —0.9 I> 2 = 12.5 (2>)7io = 8.1 .••X> 2 = 4.4 X 2 = 0.39 P(X 2 > 0.39) » 0.99 P(X 2 < 0.39) « 0.01 with/ = 9 170 INTRODUCTION TO STATISTICAL ANALYSIS [12-5 Table 12-5 Xi Expected Observed Pi <r? X? —10.5 0.85 2 0.0017) - 9.5 1.99 2 .0040 } 6.826 0.001 - 8.5 4.08 3 .0082J - 7.5 7.72 12 .0155 7.599 2.410 - 6.5 13.55 10 .0272 13.177 0.956 - 5.5 21.56 15 .0433 20.630 2.086 - 4.5 31.37 35 .0630 29.397 0.448 - 3.5 42.08 45 .0845 38.525 0.221 - 2.5 51.34 49 .1031 46.050 0.119 - 1.5 57.42 54 .1153 50.799 0.230 - 0.5 59.06 70 .1186 52.058 2.299 0.5 55.23 48 .1109 49.103 1.064 1.5 47.41 55 .0952 42.896 1.343 2.5 37.40 32 .0751 34.591 0.843 3.5 26.84 32 .0539 25.396 1.048 4.5 17.68 15 .0355 17.052 0.421 5.5 10.71 8 .0215 10.477 0.701 6.5 5.88 4 .0118 5.807 0.608 7.5 2.94 2 .0059) 8.5 1.39 2 .0028 | 4.881 0.878 9.5 0.60 3 .0012J 2> :? = 15.676, / = 17 - 3 = 14 0.3 < P(X 2 > 15.676) < 0.5 in that interval, while the fourth shows the elementary probability of the single event, a landing of the dart in that interval. There are 21 lines in the table. We can regard the experiment as 498 trials at the rolling of a "die" having 21 sides such that the probability of a particular one of the sides turning up on a single trial is the figure given in the fourth column. We saw in Section 12-1 how to cope with this situation. For each side i, the expectation value is 498P», and the square of the standard deviation in this number is 498P;(1 — Pi). The third column then shows the single observation made on the expectation value 498 Pi, and the fifth column shows the square of the appropriate standard deviation. The last column shows X- for each of these observations; it is the ratio of the square of the single error at each point divided by the square of the standard deviation. Each such value of X 2 , considered by itself, is a single observation with no constraint and has one degree of freedom. The reader will note that at each end of the distribution, three intervals have been treated as one observation. Fry [2] has shown that the accuracy of the computed value of X 2 drops when the number of events which constitute an observation is too small; 5 is a convenient number to use 12-5] THE X-SQUARE DISTRIBUTION: TESTS OF FIT 171 as the limit. Thus in our case, one might say that we have converted the 21-sided "die" to one with 17 sides. Reference to Table B for one degree of freedom shows that the probability of occurrence of values of X 2 of the magnitude in Table 12-5, or greater, ranges between about 10 and 98%. A better test involves the additivity property. The sum of all these values of X 2 is 15.676. However, we must recognize that the number of degrees of freedom for the sum of all of them is not quite 17. The additivity property applies to independent observations; when we add them all, the observations are not all independent. There are three constraints: they come from the use of the data to compute the mean, the use of the data to compute the standard deviation, and the fact that the number of events must add up to 498. Then the number of degrees of freedom for £x 2 is 14. Reference to Table B for / = 14 shows that a value of X 2 > 15.676 has a chance of occurrence of somewhere between 30 and 50 times out of 100. We conclude that the normal distribution with /x = —0.7229 and a = 3.352 is a satisfactory description of these data. The last example to be shown is a coin-toss experiment. We will see that the analysis is much like that in the previous example. However, the distribution is binomial, and the probability of occurrence of an individual event is assumed to be known from a -priori considerations. Thus when all the separate values of X 2 are summed, there is only one constraint. This is another way of saying that the numbers of tosses which result in a particular event must add up to the total number of tosses. The example to be used is the repeated tossing of four coins simultane- ously and the observation, after each toss, of the number of heads which appear. With four coins there are five possible events; that is, anywhere from zero to four heads can appear. The probability of each of these events Table 12-6* Number of heads, i Pi 50pi obs. X 2 640pi obs. X 2 1/16 3.125 4 0.26 40 38 0.11 1 1/4 12.5 11 .24 160 152 .53 2 3/8 18.75 21 .43 240 245 .17 3 1/4 12.5 12 .03 160 165 .21 4 1/16 3.125 2 .43 1.39 40 40 1.02 f = 4 P(X 2 > 1.39) - - 0.85 P(X 2 > 1.02) - - 0.91 * These data were taken years ago when one of the authors (G.H.W.) was a student of the other. The fifty tosses were made by G.H.W. ; the 640 tosses are the sum of those made by the whole class. 172 INTRODUCTION TO STATISTICAL ANALYSIS [12-6 can be computed according to the methods of Section 6-6. Computation of the expected number of times each event will occur, and its standard deviation, is as described in Section 12-1. Table 12-6, basically the same as 12-5, though not as detailed, shows the results of the experiment and the subsequent analysis. We conclude that the results of the experiment are not inconsistent with the assumption that the coins are good and that they were tossed randomly. 12-6 STUDENT'S t DISTRIBUTION: COMPARISON OP AVERAGES In order to give a sufficiently useful description of x 2 -testing, we had to give some consideration to the treatment of independent sets of observa- tions drawn from the same parent distribution. We shall continue this subject in this section with the object of finding a method of numerical assessment of how much the averages of two such sets can be expected to differ from each other by chance. As before, we propose that when the difference is much greater than expected, we judge that the two sets were not in fact drawn from the same distribution. For the variable t we assume that the distributions are of the same form and with the same values of <r for the two sets of data; we wish to see whether the distributions yield the same expectation values. The joint probability with which we must work is given by Eq. (12-36), but we shall proceed in a somewhat different direction from the discussion in the previous section. Two alternative expressions can be written for the sum of the squares of the real errors which appears in Eq. (12-36). We can proceed in the same manner as in the derivation of Eq. (12-22) and obtain E(error) 2 = ql + q 2 y + m(Z - n) 2 + n 2 (y - /z) 2 , (12-39) where ql = H(xi - x) 2 , ql = Efo ~ V) 2 - On the other hand, the sums can be lumped together before expansion so that they become E(error) 2 = (£x 2 + L</?) - 2/i(2>< + EvO + (ni + n 2 )ix 2 . (12-40) It is clear that !>,■ + HVi = n{x + n 2 y, and that the mean of all the readings from both sets, designated as m, is m = ni x + n 2 y, (12 _ 41) Wl + w 2 12-6] STUDENT'S t DISTRIBUTION 173 Define Q in a similar fashion to the q of Eq. (12-20), so that Q 2 = Ete - ™) 2 + Uvi - ™) 2 . Expansion of this and the use of Eq. (12-41) lead to 71 1 I i^2 Thus, by going back to Eq. (12^10), we find Z (error) 2 = Q 2 + (ni * + n ^ - 2^mx + nfl) + (m + n 2 )n 2 . Tli I Tl 2 (12-42) But Eqs. (12-39) and (12-42) are expressions for the same quantity. Therefore, Q 2 = <& + Q 2 y + ~^- & ~ V) 2 , (12-43) n\ -j- rt2 where we see that the previously mentioned quantity of special interest, (x — ?/), has turned up. Equation (12-43) is now to be used to replace Q 2 in Eq. (12-42). At the same time, we note that when (n x + n 2 )~ 1 is factored out of the last three terms in Eq. (12-42), the terms constitute a perfect square, the expression for which is simplified by the use of Eq. (12-41). The result is E(error) 2 = ql + q 2 y + -^- (x - V) 2 + (n t + n 2 )(m - M ) 2 - Til I TI2 (12-44) Thus the argument of the exponential in Eq. (12-36) becomes ql + ql (x- yf (m - n) 2 argexp = — 2<j2 2[a\/l/ni + l/n 2 ] 2 2[<r/y/n 1 + n 2 } 2 (12-45) When we use Eq. (12-39) as the sum of the squares of the errors, we are led to proceed with H dxi and H dyi exactly as for the derivation of the X 2 -distribution. The individual factors of these two products are in- dependent of each other, and the results of this procedure are just as before, except that there are two sets. This conclusion is not altered in any way by rewriting the sums of the squares of the errors, which led to Eq. (12-45). When d& is used then to designate the factors of interest 174 INTRODUCTION TO STATISTICAL ANALYSIS which are to be saved in Eq. (12-36), d& becomes W^ (qx/<r) fx ~\qy/<r) fv ~ 1 [12-6 d& = 27T(T 2 /M 2 (/,-2)/2 r ^ r(-^ )2 (/ » -2)/2 X exp< ql+ql (x-V) 2 (m — m) : 2<r 2 \ \m n 2 / Win + wo/ Xd(^)d(^\dxdy. Vni + w 2 > (12-46) Further changes of variables must be made now in order to integrate this expression partially. That is, we wish to know the distribution of (x — Tj) regardless of the values of m or of q x and q y . This distribution can be found by integrating over all possible values of m, q x , and q y to see what is left for the difference (x — %). It is to be remembered that, by the nature of their definitions, q x and q y are positive numbers. In addition to m defined in Eq. (12-41), we let three new variables be q = V where ql + ql, d tan" 1 ^ t = (x - y)Vf gy/l/ni + l/n 2 / = fx + fy = ni + n 2 - 2. The old variables, expressed in terms of the new ones, are q x = q cos 0, x = m + (12-47) (12-48) q y = q sin n 2 qtf y = m /Vnin 2 (ni + n 2 ) wigl /\/ttitt 2 (ni + n 2 ) With the aid of the previous discussion and Appendix 7, we can show that the required Jacobian J(q x , q y , x,y;q, 0, t, m) is Wf \ni n 2 so that dq x dq y dx dy V? l_ Vf \ni w 2 dg d0 eft dm. 12-6] STUDENT'S t DISTRIBUTION It is immediately apparent that the set of factors 175 (<r/V»i + n 2 ) V2tt exp (m — y)' 2(o-/Vni + n 2 y dm can be divided out with no further reference to m being left behind. That is, as expected, the error in the grand average of all the readings from the two sets is normally distributed with the weight (rti + n 2 ). The integration of its distribution function over all values of m will result in unity. The remaining part d&" is written as d$>" = ( q cos fl Y* -1 ( q sin e \ fv ~ 1 2 U * <7V^ r ^ 2 (/ x -2)/2 r ^ X expf- ^M +j) Idqdddt (/„-2)/2 &i V? (12-49) Our original intention, which we had stated before we changed the variables, was to integrate over all values of w, q x , and q y . The first was done. The integration over the latter two can be accomplished with the new variables by integrating over q from to oo and over 6 from to 7r/2. Consider the first of these. The pertinent integral, taken from Eq. (12-49), is >£( 1 + 7)]* ./o q exp Appendix 6 shows that the value of this integral is ,2\-(/+l)/2 so that 2 (/-i)/ aff /+i r ^/±l^ 1+ 2r y--— 1 ) (cos e) f '~ x (sin d)'"- 1 dddt d$" = (12-50) The remaining integration over 6 was described in Section 12-5. When this has been carried out, the distribution of t is found to be <p(t,f)dt = m dt Md( ,2\(/+D/2 1 + l) (12-51) 176 INTRODUCTION TO STATISTICAL ANALYSIS [12-6 This is the distribution discussed by W. S. Gossett* in 1908 with the pen name "Student" and referred to ever afterward as "Student's t distribution. " The quantity t is defined in Eq. (12-47). It is seen to be proportional to the difference between the means of two separate sets of observations divided by a quantity q/s/J. The latter is a sort of average of the estimates (for the separate sets) of the standard deviation of the parent distribution. That is, 1/2 (12-52) -[ q Efo - W + H(Vi ~ V) V7 L (ni - 1) + (n 2 - 1) The distribution of t, however, depends only on the total number of degrees of freedom /. Tests based on the ^-distribution are similar to x 2 -tests in that one is interested in the integral of the distribution function from some value to oo. When such tests are made, one should realize that, unlike the X 2 - distribution, the ^-distribution (Eq. 12-51) is symmetric about the origin. Thus, to verify the normalization, one must integrate from — oo to +oo, or, equivalently, multiply the integral from to oo by two. It is suggested that the reader verify this, as well as one of the entries in Table C on p. 234. The procedures have been repeated sufficiently often that no further illus- tration is deemed necessary, although a couple of hints might be helpful. The substitution t 2 = /tan 2 is useful, and ■7T/2 (cos 0) /_1 dd can be found from Eq. (12-38) by setting f x = /and/y = 1. The /which appears on the right-hand side of Eq. (12-38) then has the value / + 1. We note that the tabulated values of t for particular values of the probability satisfy the equation <p(t,f)dt+ / <p(t,f)dt = 2 / <p(t,f)dt. -00 J t J t Thus the table lists critical values of \t\, which is proper since it is just as likely that the difference between the averages of two samples will be positive as negative. A classic use of the £-test is in chemical analysis. For instance, an analysis is run for a particular element by two different methods, or two * W. S. Gossett, Biometrika, VI, 1 (1908). 12-7] the f-distribution: analysis of variance 177 Table 12-7 Sample 1 Sample 2 Analysis 0.26 .27 .28 .24 .25 .29 .27 .27 .28 .28 Av. 0.269 /i = 9 0.29 .30 .31 .30 .29 Av. 0.298 h = 4 Q V? 0.002090 + 0.000280] 1 ' 2 t = 0.0135 0.298 - 9 + 4 0.269 0.0135 bk + |] 1/2 = 3.92 / = 9 + 4 P(t > 3.92) < 0.01 Conclusion : the samples are different. methods of preparation are used to produce what is, hopefully, the same compound or alloy. In the latter example, the same method of analysis would be used on the two products. In either case, one expects to get different answers in the two analyses; the question is whether the dif- ference is within the bounds of what one might get because of the incidence of chance errors even if, in the first example, the methods had no relative systematic error or, in the second example, the products were identical. The example shown in Table 12-7 should serve to illustrate the use of the t-test for this purpose. 12-7 THE F- DISTRIBUTION: ANALYSIS OF VARIANCE Equation (12-37) gives the joint distribution of the ratios of sums of squares of residuals for samples of two different sizes to the square of the standard deviation for the distribution from which the samples were drawn. Immediately following that equation we used it to show the additive property of X 2 . In Section 12-6 we used the immediate antecedent of Eq. (12-37) to derive a distribution by means of which we could discuss the significance of an observed difference between the averages of the results for the two samples. In this section we wish to use Eq. (12-37) to derive another distribution, the F-ratio distribution, which is applicable to this case of two samples supposedly drawn from the same parent distribution. Whereas the t-test tests for differences in averages, the F- test tests for differences, via values of their ratios, in variances; variance is defined as the square of the standard deviation. This new distribution can be derived quickly, since so much of the work has been done in previous sections. Given Eq. (12-37), we define a new 178 INTRODUCTION TO STATISTICAL ANALYSIS [12-7 variable as follows: » " ©(f)"' (12 - 53) We wish to know the distribution of g regardless of the values of w% and Wy. Hence we substitute in Eq. (12-37) wl = ~ wig, d(wx) = fwy dg. Jy Jy and then integrate over all values of iv 2 . The reader should verify that the result is r U)(f) 9 {fx - 2)l2 dg <f(9, fx, fy) dg = , /V ;, , t^tt^ . (12-54) (f) r @o^/y if/2 where, as usual, / = /*+/*• We recall that the w 2 x and w 2 of Eq. (12-53) are each ratios of the sums of squares of the residuals for the corresponding samples to the variance, that is, a 2 of the single parent distribution function from which each sample was drawn. Thus Eq. (12-53) could have been written ©©-'■ which is just the ratio of the estimate of a 2 as given by the x sample to the estimate of a 2 given by the y sample. Thus, when we write (?(F, fx, /y) = f (g, fx, fy) dg, (12-55) JF G>(F, f x , f y ) represents the probability that the ratio of these two estimates will be greater than or equal to F, purely by chance, when the variances for the two samples are in fact the same. As with X 2 , the range, and t, values of F are tabulated for particular values of 6>(F, f x , /„). The latter are generally 0.05 and 0.01. However, since this distribution is asymmetric, which is evident from the way in which f x and f y appear in Eq. (12-54), the F tables must be much more extensive than those we have met so far. Table D on p. 236 shows that all the tabulated values of F are greater than or equal to unity. The designation "Degrees of freedom for greater mean square " refers to the degrees of freedom used to calculate the larger of the two variances; in order to use the table directly, we insert the value of this variance into the numerator. However, just as either of two averages could be used first when one is calculating t, so could either of 12-7] THE f-distribution: analysis of VARIANCE 179 two variances be used in the numerator to calculate F for the simpler type of application which we will consider first. The difficulty in the ^-distribu- tion was easily surmounted because of the symmetry of the distribution. Now, with the F-distribution, we are faced with the same lack of simplicity in the assessment of probability information that we met in our discussion in Section 9-4 of the standard deviation for an assymmetric distribution. Consider the table entry at the 5% level for both f x and f y equal to four, for example. We see that, working at this level, an F-ratio of 10 would indicate inequality of the variances. A ratio of 0.1, obtained by inverting the ratio, must therefore also indicate inequality of the variances. In particular, then, a ratio between 6.39 and 1/6.39 would have a 90% — not 95% — probability of occurring by chance when in fact the variances are equal. We are thus led to examine the effect on Eqs. (12-54) and (12-55) of making the substitution u = 1/g. It is left as an exercise for the reader to show that /•°o rl/F / <p(g,fx,f y )dg = / <p(u, f y , f z ) du, (12-56) JF Jo which demonstrates the obvious fact that a ratio of variances has the same distribution regardless of which one is in the numerator. Equation (12-56) also shows how the tables can be extended to ratios less than unity. As an example, consider the data used to illustrate the t-test in Table 12-7. Here, q\/f x = 0.000232 and q%/j 2 = 0.00007, with j x = 9 and f 2 = 4. With the greater mean square in the numerator, F = 3.32. The critical value, as seen from Table D, at the 5% level is 6.00. We need to know now the critical value of F < 1, still with 9 degrees of freedom for the numerator mean square and 4 degrees of freedom for the denominator mean square. We see from Eq. (12-56) that this is the reciprocal of the value of F obtained from Table D such that 0.05 = f™ <p(g,±,9)dg; JF F is found to be 3.63. Thus there is a 90% chance that the observed value of F will lie between 0.28 and 6.00, even though the true variances are equal for the two samples. The value 3.32 is therefore not inconsistent with the equality of the variances. It is interesting, however, to note that these data did indicate a real difference in the means of the two samples. It has been suggested [13] for cases like the above, where there is no criterion on the basis of which one could decide to place one or the other of the two variances in the numerator, that the larger variance be placed 180 INTRODUCTION TO STATISTICAL ANALYSIS [12-7 in the numerator, and the critical value of F, read at the 5% level, say, be interpreted as that which will be exceeded 10% of the time, by chance, when the variances are equal. Considering the large differences in critical values of F at the 1 and 5% levels for small samples, this is probably satisfactory for most work. We note from Table D, however, that this approximation approaches exactitude only when both samples are large. Thus, in the example used above, we worked with critical values of 6.00 and 3.63, which differ by almost 50%. At values of / equal to 40 and 50, say, these two numbers become 1.63 and 1.66. Possibly the most extensive use of the ^-distribution is in conjunction with the procedure called "analysis of variance. " This procedure can be described as a partition or separation of the total variance of a particularly organized set of observations into parts assignable to different sources. One of these sources will always be the random error assumed to be common to all the observations. The F-ratio distribution is then used to examine the ratio of the variances from the other sources to the error variance in order to assess the probability that other differences occur between the readings than those which could be assigned to the random error distribution. Thus there will be no problem here as to which variance is to be put into the denominator. In our discussion of the ^-distribution and its use, we suggested as an example that two different people performing the same experiment with the same equipment might well get different results. The tests proposed there were designed to make a critical examination of such a suggestion. Thus we make the hypothesis that the results are not "really" different — for instance, it is not really true that one of the observers is nearsighted and the other farsighted — but the results only appear to be different because of the chance distribution of the errors in the observations, all of which, according to the hypothesis, were drawn from the same popula- tion. The probability, under this assumption, of the event (namely, the size of the difference between the two average results) that did occur is then examined. If it is large, we conclude that the hypothesis is verified. If the probability of the actual event, under the hypothesis of no difference, turns out to be very small, we usually conclude that the hypothesis is erroneous. The principal use to which the F-distribution is put is, in a manner of speaking, an extension of this idea. To continue with an expanded version of the above example, one might imagine two, or three, or more people making observations on the elastic limit of a particular steel after it has been annealed and quenched from each of several successively higher temperatures. In addition to the questions raised earlier relative to the inherent random error vs. any systematic differences between the observers or their methods, there is now the equally interesting question as to whether 12-7] the f-distribution: analysis OF VARIANCE 181 the heat treatment affects the elastic limit. To be able to answer the latter question one must estimate the effect of changing observers, the effect of the heat treatment, and the random error that is hypothesized to be com- mon to all observers and heat treatments. Again we make what is called the "null" hypothesis, that there are no real differences between observers or between heat treatments, and then assess the probability of the observed result. That is, we determine certain values of q 2 /f and their ratios. Then the probability that these ratios have the value that they are observed to have is determined when it is supposed that all observations were drawn from a single population with unique values for the mean fx and the standard deviation <r. It is clear from the above discussion that we envisage an array of numbers X»-y, which can be arranged as follows I -^11 X\2 • • ' X\ m X21 X22 X-2m i : : (12-57) ■Anl X n 2 ' " * X nm where i = 1, . . . , n and j = 1, . . . , m. The various values of j may correspond to the analysts, and the various values of i to the heat treat- ments. In effect, all the readings in a row, each taken by a different analyst, can be used to estimate the effect of the heat treatment corresponding to that row. All the readings in a particular column can be used to estimate the systematic error associated with the observer who made the observa- tions in that column. It must be realized that, while it is easier to describe these ideas as has just been done, all that one can actually test is whether or not one heat treatment has a different effect from that of another, or whether there are differences in the systematic errors associated with the different analysts. Under the null hypothesis it is assumed that neither analyst nor heat treatment introduces any systematic (nonrandom) effect, and that there is a single answer X for the elastic limit. Were the null hypothesis violated, however, Xij could be written as Xh = X + Ei + Ej + Vij, (12-58) where Ei is the fixed systematic "error" contributed by the ith. heat treat- ment, Ej similarly the contribution by the jth analyst, and vij is that particular random error, drawn from the population which is associated with the kind of experiment being performed and which happened to turn up at the ^yth reading. 182 INTRODUCTION TO STATISTICAL ANALYSIS [12-7 The mean of the jth column then would be Xj = X + \ (E *</) + Ej + \ (j2 f «) I (12-59) and the sum of the squares of the residuals between each of the X,- and their mean, which, of course, is the mean of all the readings, would be Exf-i(E^) 2 3 x 3 ' = E *J - i (E *)* + £ [E (E •«)' - £ (? E .«) r + 1 [E * (E •«) - 4 (E W)(E E »«)] • (12-60) We see that no sum of v% appears in (12-60). Those sums of the «/»-,• which do appear will be proportionately less as j increases, if the Ej are sufficiently large and nonrandom. Thus the numerical value of the right- hand side of Eq. (12-60) serves as an estimate of the sum of the squares of the residuals between the various Ej and their mean. Let us emphasize here that, in practice, one does not know any of the Ej but must estimate their mean and then calculate residuals as usual. A degree of freedom is lost in the process; the counting of degrees of freedom will be treated in the following discussion. A similar expression can be found for the sum of the squares of the residuals between the row means and their mean. If the sum of squares of the residuals between each X^ and the mean of all the Xij (the grand mean) is found, it will be ee^-^(ee^) 2 % 3 * 3 ■[E*»-K?*)l + -[? J,, -=(9 jr ')l [ee4-^(ee»«) 2 ] u i 3 N * 3 ' + 2[£E.E»«-ifc>)(EE».')] + 2 [E *y E »« - i (E S;)(E E »«)] • d 2 ^ 1 ) = m + 12-7] the ^-distribution: analysis of VARIANCE 183 Some of these terms are identical with those seen previously. If the sum of the squares calculated for the column means is multiplied by n, and the sum of squares of the residuals between the row means is multiplied by m, and then if each of these is subtracted from the total as given in Eq. (12-61), we will find -M?(£-H(£2>) 2 } Here none of the Ei and E, appear. Thus this is the estimate of the random error. Again, every one of these terms except the first will become insignificantly small as the number of observations increases. The reason for multiplying by n and m before separation of the sums from the total sum of squares should be brought out. It is clear that when the total sum of squares of the residuals was calculated, each observation was given the same weight, which may be called unity. The weight of a column mean, however, is n since n readings were used in its determination. The sums of squares of readings of weight n is 1/n times those of the same number of readings of unit weight, so that in order to be able to make the proper comparison, by subtraction in this case, between sums of squares of the same weight, we must convert all of them to unit weight. The same argument applies to the row means, except that n is replaced by m. Our discussion so far has been confined to the sums of the squares of the residuals. Since we are using the /^-distribution, we must calculate ratios of the form F= ¥ W£T- where /i is the number of degrees of freedom associated with the sum of squares q\, and similarly for/ 2 and q\. In most applications — all of those here, in fact — subscript 2 refers to the error estimate and subscript 1 refers alternately to the columns and rows of an array like (12-57). In such an application, the numbers of degrees of freedom can be easily obtained. To get the total sum of squares, we must calculate the grand mean. Thus the total sum of squares has a number of degrees of freedom /total = mn — 1. When the sum of squares between column means is to be calculated, we must first calculate, in principle at least, a mean of the column means. 184 INTRODUCTION TO STATISTICAL ANALYSIS Table 12-8 [12-7 ^\. j i ^\^ 1 2 3 4 Xi Xi x 1 5 4 3 2 3.5 —1 2 6 5 4 3 4.5 3 7 6 5 4 5.5 +1 Xj 6 5 4 3 x — X +1.5 +0.5 -0.5 -1.5 x - = 4.5 4[E^ 2 - UT,Xi) 2 ] = 8, 3[IX 2 - j(2>,-) 2 ] = 15 Table 12-9 \^ j 1 2 3 4 i ^\ 1 -0.1 +0.2 -0.4 +0.1 2 +0.5 -0.4 3 -0.1 +0.3 -0.3 +0.2 Table 12-10 i ^\ 1 2 3 4 «, *-* 1 2 3 4.9 6.5 6.9 4.2 5.0 6.3 2.6 3.6 4.7 2.1 3.0 4.2 3.450 4.525 5.525 —1.050 0.025 1.025 xj-x 6.100 1.600 5.167 0.667 3.633 —0.867 3.100 —1.400 x -- = 4.5 2>g - tVIX) 2 = 26.060 4[Z^ 2 ~ *(E^) 2 ] = 8.615, 3(Zxf - i(I» 2 ] = 17.150 26.060 — 8.615 — 17.147 = 0.295 Table 12-11 / q 2 Q 2 /f F ^99 Total 11 26.06 2.369 Column 3 17.150 5.717 116.20 9.78 Row 2 8.615 4.308 87.56 10.9 Error 6 0.295 0.0492 12-7] the f-distribution: analysis of variance 185 Thus this number of degrees of freedom is /col = ra — 1, and similarly, /row — W A. The number of degrees of freedom left by which the error variance can be estimated is fe = /total /col ~~ /row- The above description is convenient for use. But it might be more satis- fying to look at f e , for instance, in another way. Let us count the actual number of independent quantities that have been calculated. It is clear that one of these is the grand mean. It is also evident that the mean of the column means is the same as the grand mean. Thus, when ra — 1 of them have been determined, the rath can be calculated from the first ra — 1 and the grand mean, so that only ra — 1 of them are independent. The same argument applies to the row means; only n — 1 of them are independent. Thus the number of independent quantities that have been calculated is 1 + (ra — 1) + (n - 1), and, as usual, the degrees of freedom for the error calculation is / e = ran - [1 + (ra - 1) + (n - 1)]. As an introductory example, consider an array such as (12-57) where there are very definite differences assignable to rows and columns, but there is no error. Suppose n = 3, ra = 4, E { — 1, 2, 3, Ej — 4, 3, 2, 1 with the X Q of Eq. (12-58) also equal to zero. Then the array becomes Table 12-8. The total sum of squares, 23, was partitioned into 8 for the differences between the means of the rows, and 15_for the differences between the means of the columns. We see that an X of 4.5 was found, emphasizing the fact that there is no way of determining the X used in the construction of the table. Correspondingly, individual values of the Ei and Ej cannot be found. This example, where there is no error, shows that the unit differences between the E's shows up in the differences between individual row and column means. Now suppose that the error assignments shown in Table 12-9 are made to the various ijth locations. The observation table is now as in Table 12-10. As a result of the partition shown in Table 12-10, we see that there is now something left over for an estimate of the error, so that it is possible to proceed with the elementary analysis as outlined in the table. The sum of the squares associated with row means, 8.615, has 2 degrees of freedom, so that the variance here is 4.308. The error sum of squares has 6 degrees of freedom, so that the error variance is 0.0492. The F-ratio 186 INTRODUCTION TO STATISTICAL ANALYSIS [12-7 Table 12-12 / q 2 q 2 /f F ^95 ^99 Total Column Row Error 11 3 2 6 131.00 85.68 15.50 29.82 11.91 28.56 7.75 4.97 5.74 1.56 4.76 5.14 9.78 10.9 Table 12-13 Li L 2 L 3 U L 5 U L 7 1271 1268 1269 1267 1269 1267 1267 1423 1419 1420 1420 1419 1416 1420 1553 1549 1548 1550 1549 1550 1550 1764 1761 1760 1760 1762 1764 1763 1953 1951 1951 1950 1952 1953 1953 2050 2043 2044 2043 2043 2048 2048 2218 2215 2217 2216 2218 2220 2219 Table 12-14 DIFFERENCES BETWEEN NBS (1958) AND OTHER LABORATORIES (LAMP NO. 242) Current, amp Li L 2 u u u L 6 L 7 8.62 —4 —1 —2 —2 9.77 -3 +1 +1 +4 10.90 —3 +1 +2 +1 12.99 —4 — 1 —2 —4 —3 15.07 —3 — 1 — 1 -2 -3 -3 16.16 -7 —1 —5 -5 18.28 —2 +1 — 1 —2 —4 -3 Table 12-15 / q 2 q 2 /f F F95 ^99 Total Lab (column) Current (row) Error 48 6 6 36 205.06 75.63 52.78 76.65 4.27 12.60 8.80 2.129 5.92 4.13 2.36 2.36 3.35 3.35 12-7] the f-distribution: analysis of VARIANCE 187 for the effect of possible systematic differences between rows is then 87.56. We now have to compare this ratio with the correct number in Table D. The ratio given in the table for numerator degrees of freedom = 2 and denominator degrees of freedom = 6 is 5.14 at the 95% level. This means that any value of F for these degrees of freedom less than 5.14 lies in the range that includes 95% of all the values of F, while one as large as 87.56 lies in a range that includes only 5% of the values. Even at the 99% level this limiting value is only 10.9, so that one can conclude that it is very improbable that a value as large as 87.56 would occur by chance. One reaches a similar conclusion, of course, concerning the column means. The results are summarized in Table 12-11. As a further illustration, we can multiply the w,-y of Table 12-9 by 10 before insertion. This makes the errors comparable to between-column and between-row differences. We leave it as an exercise for the reader to reproduce Table 12-12. In this table, there is still an indication of a dif- ference in the between-column effect. The observed value of F has a probability of occurrence by chance of around 4% when there is no real difference. The value of F for the between-row differences has a very high probability of occurring purely by chance when there is no real difference between the rows. We shall conclude this section with a real example.* In Table 12-13 are listed the temperatures of the filament of a special type of lamp as observed by several national standardizing laboratories. Each row cor- responds to a particular current in the lamp. It is clear that there is no point in making between-row calculations; obviously there will be dif- ferences much larger than any errors that such laboratories might make. It is convenient then to use one of them, L 4 say, as a base and origin, and subtract its observations from each of the others in the same row. The result is Table 12-14, which is much more manageable. The analysis of the variance is shown in Table 12-15. The fact that the Lab ratio exceeds the critical i^-ratios strongly suggests that there is a systematic difference between these laboratories which is larger than the error made in the observation of a temperature at any one of the laboratories. It is seen also that the between-current to error F-ratio exceeds the critical ^-ratios. Moreover, Table 12-14 shows an odd distribution of values in the rows, particularly noticeable in columns L 6 and L 7 . This kind of distribution demands analysis of a degree of sophistication beyond this book, though the simple ratio did point out that there is something there worth examining. Thus even the simplest variance analysis was very useful in a complex case. * R. J. Thorn and G. H. Winslow, Rev. Set. Instr., 33, 961 (1962). 188 INTRODUCTION TO STATISTICAL ANALYSIS [12-8 12-8 THE TWO-DIMENSIONAL NORMAL DISTRIBUTION; THE CORRELATION COEFFICIENT It was mentioned in the introduction to this chapter that some distributions are two dimensional. In this section we shall attempt to face this problem, but our discussion will be severely limited compared to what could be said on this subject or its natural extension to multidimensional distribu- tions. Since any presentation of this topic easily gets far beyond the scope of this text, we shall confine ourselves to the parent distribution for two normally distributed variables which are said to be in linear correlation. Moreover, we shall restrict our discussion even for this case. We will not use the parent two-dimensional distribution to derive any distribution functions for finite-sized samples in the way in which we used a one- dimensional parent normal distribution to derive the x-square, Student's t, and the i^-ratio distributions. Rather, we will present this distribution only for the purpose of giving the reader an introduction to the implications of multidimensional distributions. We also wish to emphasize the distinc- tion between that parameter of the parent two-dimensional distribution which is called the correlation coefficient and a quantity calculated for a finite sample which is called by the same name, and hence to show why the latter is defined as it is. The derivation of the distribution of the sample correlation coefficient can be carried out without reference to the parent two-dimensional coefficient, and this is what we will do. As stated above, we are concerned with the observations of two variables, taken in pairs. Let us suppose that the relation between them is of the linear form y = A + Bx (12-62) and review the meaning which this form has had for us up to this point. We have viewed the variable £ as a nonrandom variable, the value of which could be established at will during the course of an experiment. We observed values of y at various values of x in order to gather sufficient information for estimating to some desired and calculable degree of pre- cision values of the parameters A and B. We found that repeated observa- tions of y at the same x could not be expected to yield the same results. It is evident then that we assumed that at each value of x, there was some mean value fx of y which we wanted to estimate by performing the experi- ment. In the context of previous discussions, the "true" relation was assumed to be fx = A + BqX. (12-63) In the following discussion we will also see that we have assumed the standard deviation in y to be the same at each value of x. Thus we have 12-8] THE TWO-DIMENSIONAL NORMAL DISTRIBUTION 189 been studying a series of one-dimensional distributions of y where each such distribution had the same standard deviation and a mean determined by the assigned value of x. The distinction between the above situation and the one in which both x and y must be considered as random variables can be most easily under- stood through an illustration. Let us suppose now that we are not per- forming a controlled experiment but are merely observing pairs, (x, y), where the occurrence of particular values of either is entirely beyond our control. Such a situation might be illustrated by the relation between the height and weight of 25-year-old men used as an example in the introduc- tion to this chapter, or by the relation between the temperature and the total runs per game in big-league baseball, where it might be supposed that there is a tendency toward fewer runs at higher temperatures, or by the relation between student grades in mathematics and in physics or chemistry, and so on. In the first and third of these examples it is clear that there is no basis on which to say that a change in one of the variables causes a change in the other in the same way that we can say that a change in temperature causes a change in vapor pressure. Neither can this be said in the case of the second example since air temperature and humidity tend to vary together, and the latter might be expected to have as great an effect on the number of runs per game as the former. Thus in cases like these, each variable must be considered as random. In deriving the following particular distribution, we assume that each of the variables, considered separately, is normally distributed. That is, we can assume some single mean value n y of y, and a standard deviation (T y in y; and similarly for x. Further, we can also apply remarks similar to our earlier ones about the existence of a normal distribution of values of y for a given value of x by extending them to the existence of a normal distribution of values of x for a given value of y. For the sake of convenience in later descriptions, let us introduce a new term. When we plot y as a function of x according to Eq. (12-62), we speak of the line through the points as the line of regression of y on x, and con- versely, we speak of the line of regression of x on y. An examination of Eq. (10-7), which is one of the equations used to determine A and B for a finite number of observations, shows that we can write this equation as y = A + Bx. For the often suggested infinite number of observations, this would become ,„„ „,. » y = A + B of x X) (12-64) where A and B are the true values for the line of regression of y on x. 190 INTRODUCTION TO STATISTICAL ANALYSIS [12-8 If we examined the line of regression of x and y, we would similarly find fix = a + bofiy, (12-65) where fx x and n y are the same as in Eqs. (12-64) and (12-65), but it is not necessarily true that An 1 a = - ~> b = ~- • (12-66) -°0 -DO Consider the probability of observing a particular (x, y) pair in the context of the line of regression of y on x. For a particular value of x, the mean value of y is At = fi y + B (x - fi x ), (12-67) where we have used Eq. (12-64) to eliminate A . We let the standard deviation for the distribution of the values of y at this value of x be represented by a, and leave its relationship to other parameters of the distribution to be determined later. Then the probability of obtaining a particular value of y at this value of x is V2 1 =— exp tot (y - m) : 2a' dy, where ft is given by Eq. (12-67). The probability of observing this value of x is (x — fi x ) 2 1 — =— exp \ / 2ir<r x 2a x so that the probability of observing this particular pair is d * = _I_ exp (_ (* ~ M*) 2 _ [y -fiy~ B (x - fi x )] 2 ) dxdy _ 2iraa x I 2a x 2a 2 ) If we now consider the probability of obtaining this same (x, y) pair via a discussion of the line of regression of x on y, we will obtain dV = -^— exp (- (y ~ ^ - &-*>- b »(y ~ ^ \ dxd% 2Tra'a y I 2a 2 2a' 2 ) where a' is the standard deviation for the distribution of the values of x at this value of y. Since both these expressions refer to the same (x, y) pair, they must be equal. If they are to be equal for any (x, y) pair, not only must the arguments of the exponential functions be equal but, in those arguments, the coefficients of like powers of (x — fi x ) and (y — fi y ) 12-8] THE TWO-DIMENSIONAL NORMAL DISTRIBUTION 191 must be equal. Therefore we must have, as the reader can show, _1 i ^o = J_ (12-68) a 2 x a 2 a' 2 ?% = %> (12-69) a 2 a 2 JL + h. = J_ . (12-70) <j y a (J If Eq. (12-62) merely represented an algebraic relationship, there would be no scattering or grouping of the points along a straight line, but instead every pair would be exactly on the same line. Then Eqs. (12-66) would be satisfied and the product (J5 &o) would be equal to unity. This would still be true if the line y vs. x approached the vertical or the horizontal. On the other hand, if the scatter of the infinite number of pairs were such that the points were uniformly distributed in the plane, both B and b would be zero and the product would be zero. Thus we are led to define a fifth parameter for the two-dimensional distribution in addition to n x , <r x , ix y , and <jy. This parameter, called the correlation coefficient, is defined by p 2 = B b , < 12 " 71) where the value of p 2 ranges from to 1. When p is 0, we say that there is no correlation; when it is 1, we say that the correlation is perfect. If Eq. (12-69) is multiplied through by B and Eq. (12-71) is sub- stituted into it, we will obtain &o P r'2 so that from Eq. (12-67) we find that a' = <T x Vl - P 2 . (12-72) For the positive square root of B^ ft — P<T cVl - P 2 Since a and <j x are necessarily positive and since —1 < p < 1 when < p 2 < 1, we can speak of negative correlation if one variable tends to increase as the other decreases, or of positive correlation if they either increase or decrease together. Similarly, a = a Vl - p 2 . (12-73) 192 INTRODUCTION TO STATISTICAL ANALYSIS [12-8 From Eqs. (12-72) and (12-73) we obtain (T'(Ty — (T(T X , so that the coefficients of the exponential functions in the two alternative expressions for the probability of observing a particular pair are also equal. Therefore, by using either of these expressions with appropriate substitu- tions, we find the desired two-dimensional normal distribution for variables in linear correlation: d$ = 1 2 7r<T x <T y \/l — P' Xexp 2(1 -p 2 ) (x — ix x ) 2 2p(x — fix)(y — fiy) . (y—jty)' 4 O'xO'z dxdy. (12-74) We must, however, verify the normalization by integrating d$ over all values of x and of y. The troublesome cross product (x — n x )(y — P-y) can be eliminated if we make the substitutions: - p x h — P fl y — v-y _ „. . /i — P x — cr :zJbL = W- + V \-T- The reader should show that the Jacobian for this change of variables is GWv/l — p 2 so that L , * = h L exp [- t (ifi )] du L exp [- i (tH)] dv ' all y which is unity. It is easy to see that if p = 0, Eq. (12-74) becomes the product of a normal distribution for x and a normal distribution for y. That is, x and y are completely independent. There is no correlation between them, no mutual influence of one on the other, and the (x, y) pairs observed together would tend to fill the (x, ?/)-plane uniformly with points. The easiest way to see the effect of perfect correlation is by looking at Eqs. (12-72) and (12-73). If p = ±1, then a = a' = 0. That is, for a particular value of x there is no scatter in possible values of y, and vice versa. All {x, y) pairs lie exactly on the same line. The experienced reader will have observed that it would be most un- usual for all (x, y) pairs in a real, finite sample to lie exactly on the same straight line. Were it to happen, the most likely explanation would be 12-8] THE TWO-DIMENSIONAL NORMAL DISTRIBUTION 193 as discussed in Section 10-9; the scales used were too coarse. Thus any real observational situation where a relationship of the form of Eq. (12-62) is known or expected to exist between two variables, it is highly unlikely that a finite-sized sample will show perfect correlation. We are then faced with the problem of calculating a correlation coefficient for such a sample and discussing its meaning once it is calculated. The method of calculating the correlation coefficient for a real sample follows naturally from Eq. (12-71). For the case in question the latter is rewritten as r 2 = Bb, (12-75) where r is the correlation coefficient for the sample, B is the least-squares value of the slope of the line of regression of y on x, and b is the least- squares value of the slope of the line of regression of x on y; all these are for the same set of data, of course. Reference to Eqs. (10-10), (12-18), and (12-20) shows that if the numerator and denominator in the first of these are divided by n, the equation can be written as where Similarly, Thus or, directly in terms of the observations, B = YsXiVi — nxy <& <& = Y<(xi — x) 2 . b = Y,XiVi — nxy A 2>,-2/,- — nxy (12-76) (12-77) n^Xjyj 2—iXj2_,yi ^ H2— 78") V[nl>! - (Ex,) 2 ] WLv\ - (Lyi) 2 ] A derivation of the distribution of r based on Eq. (12-74) is even more tedious than those we have faced in the previous sections of this chapter. Such a derivation has been given by Uspensky [14], for instance, and will not be reproduced here. Unfortunately, even this derivation leads to a result which specifically depends on the value of the unknown "true" correlation coefficient p for the distribution. On the other hand, the more common test, and the one for which tables can be found most easily, is based on a study of the probability of observing a value of r less than or equal to some arbitrary value R when p is zero, i.e., when there is no correlation in the parent distribution. We can arrive at this parent distribution by discussing the line of regression of y 194 INTRODUCTION TO STATISTICAL ANALYSIS [12-8 on x, treating x as a nonrandom variable. Then the mean of y at each value of x is given by Eq. (12-63), and we assume the standard deviation a in y to be the same at each value of x. The probability of obtaining the n observations is d* = (-±=)" exp [- S(" - i\~ B ^ 2 } n dy \v2tvg/ L la \ i=l (12-79) The least-squares value of the slope is given by Eq. (12-76); instead of using the least-squares intercept, we prefer to eliminate it through A = y — Bx. Corresponding to the sums of squares of residuals designated previously by q 2 , we define Q 2 = Ufa - A - Bx;) 2 = YXiVi ~ V) - B(xi - x)] 2 . (12-80) In the expansion of the right-hand side of Eq. (12-80) we find the term Y,(xi — x)(yi — y). The reader can show that this is simply Bq\ so that Q 2 = q 2 y - B 2 qi (12-81) With this definition of Q 2 , we see that the sum of squares of the real errors which appears in the exponential function of Eq. (12-79) is Lerror 2 - Q 2 + q 2 (B - B ) 2 + n(y - A - B x) 2 . (12-82) We can proceed now as previously. That is, let U n = Q, U n -l = 27, U n _ 2 = B, Ui= (jU^j)_ -^B(x i -x) } i==limmitn _ Sm We must now write the old variables in terms of the new ones. The first n — 3 are easy: y% = y + B(xi — x) + Qu^ i = 1, . . . , n — 3. By pro- cedures similar to those used in Section 12-5 we find (y n — y) + {yn-i — y) + (y n -2 — V) = —QHu + B[(x n — x) + (* n _i — W) + (x n _ 2 — x)], Xn(y n — y) + &„-i(2/n-i — y) + x n _ 2 (y n -2 — y) = —QT.xu + B[x n (x n — X) + x n _ x {x n _\ — x) + x n _ 2 (x n -2 — 2)], [fan — ^) — #0« — *)? + [(2/n-l ~ V) ~ B{x n _ X ~ x)] 2 + [(y»_2 - P) - *(*»-i ~ *)] 2 = Q 2 (l - I> 2 )> 12-8] THE TWO-DIMENSIONAL NORMAL DISTRIBUTION 195 where the sums on the right-hand sides of these equations are over the indices i = 1, . . ., n — 3. For convenience in writing, let (y n -j — y) — B(x n -j — x) = Y n _j, j = 0, 1, 2. Then the previous three equations become Y n + F n _x + F n _ 2 = -QZu, (12-83) x n Y n + 4_i7„_i + x n _ 2 Y n _ 2 = —QHxu, (12-84) Yl + F*_! + Yl_ 2 = Q 2 (l - & 2 ). (12-85) Equations (12-83) and (12-84) can be solved for Y n _ x and F n _ 2 to yield Y = Yn ^ Xn ~ Xn ~^> + QCE.XU — Xn- 2 I» } (12-86) ^n — 2 "^n — 1 *^n — 2 ^re — 1 There is no point in doing more algebra than is necessary. Examination of these equations in the light of our previous experience shows that if Eqs. (12-86) and (12-87) are substituted into Eq. (12-85), the result will be of the form M x Yl + 2M 2 F n Q + M 3 Q 2 = 0, (12-88) where the coefficients Mi, M 2 , M 3 do not contain any of the three quantities y, B, or Q. The solution of Eq. (12-88) will be of the form Yn = M~ ( ~ Mz ± ^ M * ~ MlMa) - But we see from Eqs. (12-86) and (12-87) that if Y n is proportional to Q, then so are Y n _ x and F n _ 2 , and y, B, and Q are not in the constants of proportionality. In fact, we can now write Vn = V + B(x n — x) + N n Q, y n _i = y + B(x n _i — x) + N n -iQ, y n _ 2 = V + B(x n _ 2 — x) + N n _ 2 Q, where the coefficients of Q contain only the values of x and the variables Ui for i < n -— 2. The Jacobian for the variable change yi — > ui is then of the same form as found in the derivation of the X 2 -distribution; the only difference is in the power of Q which can be divided out of it. Hence we 196 INTRODUCTION TO STATISTICAL ANALYSIS [12-8 see that the Jacobian can be written as CQ n ~ s so that Eq. (12-79) becomes d * = (vkaf ^ (~ 2^ ^ + ^ _ B ° )2 + ^ ~ A °~ B °* )2] ) n—3 xQ n ~*dydBdQC J[ du i- Here, as with the derivation of the X 2 -distribution, we will normalize the final distribution of interest rather than derive the normalization factor from an expression for C. We note in passing, however, that y is normally distributed, with mean A + B x and standard deviation a/\/n. Further- more, B is seen to be normally distributed, with mean B Q and standard deviation <r/q x . Now, it is indeed B we are interested in, but we wish to find a measure of its distribution which is independent of the unknown <r. We also recall that two quantities A and B were determined from the data so that the number of degrees of freedom is f = n 2. Thus out of all the terms in d$, we save only those variable factors which are of interest: d& ^"-1 51 2o- 2 1 + ql(B - B )'' Q s dBdQ. Q ' exp We get rid of the unknown a by making the change of variables Vfq x (B - B ) q = Q, t = Q where we are led to include the factor Vj i n ^ ne definition of this t because the form of d& shows something close to Student's t distribution coming up; we can cast d& immediately into the form of this known distribution by defining t in this way. In terms of the new variables the old ones are Q B Bp + qt QxVf and the Jacobian is q/(q x Vf)- If we again preserve only the variable factors of interest in d$', we will have -.2 d$" q exp g 2o- 2 (-a dqdt. We next integrate over all values of q to obtain that distribution of t which is independent of any particular value of Q. We need go no further. 12-8] THE TWO-DIMENSIONAL NORMAL DISTRIBUTION 197 Reference to the derivation of Eq. (12-51) and to the definitions of Q and q x shows that the quantity ' = (B ~ Bo) \ »E(y, - A - **,)» ) (12 89) has Student's < distribution. But the right-hand side of Eq. (12-89) is a familiar combination. Equations (11-26), (11-28), (11-29), and (11-32) show that Eq. (12-89) can be written as t = B ~ B ° , (12-90) where <Jb is the previously defined estimate of the standard deviation of the slope, calculated on the basis of what was called external consistency, for uniformly weighted observations. Equations (12-89) and (12-90) contain the unknown B . It was men- tioned that the most common correlation test is based on the hypothesis that B is zero. That is, we ask how likely it is that the observed B would occur when actually B is zero. If it turns out to be very unlikely, the hypothesis is rejected and we conclude that there is correlation. The decision we reach depends on the probability level at which the test is made; i.e., the degree of risk of making an erroneous decision that we feel we can stand. Before showing the relationship between this ^-distribution and the distribution of the correlation coefficient, we will consider a real, physical example. The example is fairly involved, but for that reason, it is illus- trative of the use of statistical analyses beyond the level discussed in Chapters 9, 10, and the early sections of Chapter 11, and is used to en- courage the reader to work beyond that level. The example involves the study of a model for the solution of additional oxygen in solid uranium dioxide. We will not burden the reader with the theory involved, but will ask him to take the equations on faith. One can derive from the theory an equation Y = a + bx, (12-91) where x is the ratio of the number of added atoms of oxygen to the number of uranium atoms present, and a and b are supposed to be constant if the model fits the facts. The function Y turns out to be of the form Y = -F% 2 + 2RT I In ( — — ) + C L n \a — x) + /(T), (12-92) where /(T 7 ) is a known function of the temperature, T, C and a are constant if the model fits the facts, R is the gas constant, — Fq 2 is a quantity 198 INTRODUCTION TO STATISTICAL ANALYSIS [12-8 measured by the experimenter at various values of T for a given value of x, and the measurement is done at various values of x. We wish to see whether the model fits the facts with a — 1. There are no restrictions on values of C, a, or b, though the investigator has ideas as to what he thinks are reasonable ranges in which these numbers should fall. What is to be examined here is the fact that Y appears to depend on the temperature according to Eq. (12-92), but Eq. (12-91) says that it should be independent of the temperature. If these two facts cannot be brought into agreement, the model is a failure. The problem is examined by calculating B = dY/dT and (Tb by the standard least-square procedures, and t from Eq. (12-90) with the assump- tion that Bq = 0. The results for a few representative observations are given in Table 12-16 along with values of t that would be expected to be exceeded 5% and 1% of the time if B actually were zero. The hypothesis is verified at neither level for x at 0.026, 0.158, or 0.172, at the 1% level but not at the 5% level for x at 0.112, 0.085, and 0.114; it is verified at both levels for all other values of x. This is the conclusion reached by a more detailed examination of a greater mass of data; that is, the model is satisfactory for x < ~0.08 and fails for x > ~0.08. This more detailed examination also verifies another conclusion implied above: something went wrong experimentally at x = 0.026. We have shown above how to make the usually desired test on problems of correlation by using the ^-distribution. Nevertheless, the reader should Table 12-16 dY X / df t to.05 to.oi Ref. 0.051 1 —0.122 1.54 12.71 63.66 a 0.078 1 0.681 8.64 12.71 63.66 a 0.112 1 —2.731 34.65 12.71 63.66 a 0.030 6 0.507 0.41 2.45 3.71 b 0.033 3 0.503 2.58 3.18 5.84 b 0.026 4 8.636 4.73 2.78 4.60 c 0.037 2 0.664 1.12 4.30 9.92 c 0.085 4 — 1.624 3.84 2.78 4.60 c 0.114 1 0.577 24.36 12.71 63.66 d 0.158 1 —7.624 107.99 12.71 63.66 d 0.172 1 —8.360 348.87 12.71 63.66 d a. Aronson and Belle, J. Chem. Phys., 29, 151 (1958). b. Markin and Bones, Atomic Energy Research Establishment Report AERE- R4178 (1962). c. Kiukkola, Acta Chem. Scand., 16, 327 (1962). d. Roberts and Walter, /. Inorg. Nuclear Chem., 22, 213 (1961). PROBLEMS 199 be familiar with the relationship between this distribution and the dis- tribution of the sample correlation coefficient given by Eq. (12-77). If Eq. (12-89) is rewritten in terms of Q and q x for B = 0, it becomes t = B V ~ fq *- Q From Eqs. (12-76) and (12-77) we find that Bq x = rq y , so that from Eq. (12-81) we obtain Bq x Vl Hence t = ^ • (12-93) \/l — r 2 Thus, for example, the critical value of r for / = 4 that would be ex- pected to be exceeded 1% of the time for p = 0, calculated from the corresponding ^ .oi = 4.60, is 0.917. In order to see what the distribution of r looks like, we make in the t- distribution as given by Eq. (12-51) the variable change corresponding to Eq. (12-93). The result is ,(r,/) dr = J- V 2 ' (1 - <■•)"-•>" *, ,2, but we will do nothing further with it, since we have accomplished our purpose via the t-test. PROBLEMS 1 . For observations distributed normally : (a) What is the expected number out of 10 such observations that will have values of X/a greater than or equal to unity, and what is the standard deviation in this number? (b) What is the expected number that will have |X/<r| greater than or equal to 0.6745, and what is the standard deviation in this number? Answer: (a) = 1.59 ± 1.16 (b) = 5.0 ± 1.6 200 INTRODUCTION TO STATISTICAL ANALYSIS 2. For 10 observations taken from a normal distribution, write the expression from which one would find the distribution function for those having errors such that — 1 < X/a < 1. Answer: C(10, R) (0.6827) * (0.3173) 10 ~ R 3. For a pair of fair dice : (a) What are the probabilities for getting each of the possible sums from 2 to 12 in a single toss? (b) For 12 tosses, write the expression from which the distribution function for sevens would be calculated. (c) For 12 tosses, what are the expectation values and standard devia- tions for sevens, for either four or ten, for either nine or eleven? Answer: (a) 2 or 12, -^g-; 3 or 11, y§-; 4 or 10, -^; 5 or 9, ^; 7, •§- (b) C(12, #)(£) B (f) 12 ~ R (c) 7, 2 ± 1.3; 4 or 10, 2 ± 1.3; 9 or 11, 2 ± 1.3 4. You are presented with the following numbers of counts/min for a single radioactive sample: 33.0, 32.2. 32.3, 31.6, 31.0, 32.6, 32.8, 31.1. Assuming that each figure was deduced from approximately the same total count, estimate that count and the time interval that was used. Answer: C = 1780 counts, t = 55.5 min 5. Ten fair dice are rolled. What is the probability that two of them show an ace, five show either 2 or 3, and three show either 4, 5, or 6? Answer: 35/(36 X 27) 6. A uniform distribution extends from — 5 to +5. Of five readings, what is the probability that none lie between — 5 and — 4 or between 4 and 5, that one lies between — 4 and — 3 and one lies between 3 and 4, and that three lie between — 3 and +3? Answer: 0.0432 7. For readings having a uniform probability of occurrence between — a and +a, and zero probability of occurrence outside these limits: (a) Show that the distribution function of the range is n{n — 1) n -2 /0 v fir, n) = , n . r (2a — r). Jy ' ' (2a) n [Hint: Note that the possible values of x, using the notation of Section 12-4, for a fixed value of r, range from — a to a — r.] (b) What is the most probable value of the range? (c) What is the expectation value of the range? Answer: (b) 2a (c) [(n — l)/(n + l)](2a) 8. For the distribution of Problem 7, and n = 2, what values of the range are expected to be exceeded 99% of the time and 1% of the time? Answer: R ~ 0.01a exceeded 99% of the time, R = 1.8a exceeded 1% of the time 9. Show that the expectation value of X 2 is / and that the expectation value for the mean-square error in X 2 , i.e., the expectation value of (w 2 — f) 2 , PROBLEMS 201 is 2/. Hence, remembering the definition of X 2 , and using the approximation methods of Chapter 3 for large values of /, show that we estimate a for the parent distribution to be \Z^v 2 /f and the standard deviation in a to be 0-/V27. Compare the first of these results with Eqs. (10-24) and (10-26). [Hint: Write X 2 = / ± V/.] 10. In the dart-dropping experiment, 498 observations were used to determine the two quantities m and a. What is the standard deviation in a? Would it appear that the value of a estimated with any ordinary number of observa- tions is very precise? Answer: 0.011; no 11. If a single coin is tossed N times, 2 (N H -N/2) 2 +(N T -N/2) 2 tf(i)(i) with / = 1, since the number of heads, N H , plus the number of tails, Nt, must equal N. Consider the data of Table 6-1 for N = 10, 100, 1000. For any of these three cases, can the hypothesis that the coin is good be rejected at the 95% level? Answer N X 2 10 3.20 100 2.00 1000 1.35 Since Xcrit = 3.84, the coin appears to be good. 12. After how many tosses of a single coin would a ratio of Nh/N = 0.531 cause one to reject, at the 95% level, the hypothesis that the coin is good? Answer: 500 13. By making use of the additive property of X 2 , group the first five, the second five, and the third five values of X? from Table 12-5. (a) What is the value of /for each of the three groups? (b) Do these values of X 2 still support the hypothesis that the proposed distribution describes the data satisfactorily? (c) What is the average of these three values of X 2 ? (d) What is the expectation value of this X 2 and its standard deviation? Answer: (a) 5 (b) yes (c) 4.73 (d) 5 ± VT6 14. During the course of the derivation of the X 2 -distribution it was found that x is normally distributed. Considering the meaning of the quantity t of Section 12-6, what would you expect its distribution to approach for large values of /? Verify your expectation. Hint: Use T(n) = (n — 1)! and Eq. (8-16). Note also that, if n is a large number, (1 + a/n) n -> 1 + n(a/n) + n(w ~ 1} (a/nf + > e\ Answer: normal distribution 202 INTRODUCTION TO STATISTICAL ANALYSIS 15. We found from Eq. (12-31) that^x is normally distributed, but with the unknown parameters fx and a/\/n. Let our estimate of a be called s, i.e., s 2 = q 2 /f, and then show that the quantity * = .!=# (s/y/n) has Student's t distribution. [Hint: Change variables from the old x and q to the new s and t and work only with the necessary variable quantities from Eq. (12-31).] 16. In Table B are given the probabilities of \t\ exceeding or equaling the table entry. What is the probability that, at / = 10, t < 2.228. Answer: 0.975 17. The X 2 -test is a general test for fit, not only as regards the form of an assumed distribution but also its parameters. On this basis the data of Table 12-4 were rejected as coming from a normal distribution defined by ix = — 0.7229 and a = 3.352. Could this data have come from a normal distribution with ju = —0.7229? Answer: Yes, since t (see Problem 15 above) ~ — 0.8 and (P(t < 0.8, / = 9) ~ 0.22. 18. Several measurements, all made with the same procedure, of the percent zinc in each of three samples of a bronze gave : I II III 9.82 9.30 9.10 9.70 7.96 8.74 8.90 9.64 8.93 10.10 8.79 9.36 10.67 9.80 10.68 10.32 8.47 9.55 9.25 8.43 8.98 8.25 9.53 9.86 7.80 By working at the 95% level, examine the hypothesis that the samples are identical. If they appear to be so, combine the data for the two which appear more certain to be identical and reexamine the hypothesis. Answer: I and II, t = 2.065, t CI n = 2.160; II and III, t = 0.187, t CTit = 2.145; I and III, t = 1.853, « crit = 2.110. PROBLEMS 203 The hypothesis is accepted. However, I and (II + III), t = 2.194, t ciit = 2.069. The hypothesis is rejected: Sample I appears to be different if II and III are accepted as being identical. 19. By evaluating and integrating <p(g,f x ,fy) dg verify that, for f x = 4 and f v = 2, there is a 90% chance that 0.1441 < ^^ < 19.25 Qv/fv and that, for f x = 2 and f y = 4, there is the same chance that 2 /f 0.0519 < ^~ < 6.94. Qv/fv 20. Do the data given in Problem 18 above bear out the statement that all the measurements were made by the same procedure? Answer: Yes, typically, F(I, II) = 1.41, F or it(5%) = 3.69. 21. Consider the distribution derived from the 498 observations in the dart- dropping experiment as being the true parent distribution from which the data of Tables 12-2, 12-3, and 12-4 supposedly were taken. Examine the ratios of the estimates of the variance to the true variance to see if this supposition is borne out. [Hint: Note the columns in Table D for infinite degrees of freedom.] Answer: For Table 12-2, F — 0.875, F^a = 0.369; accept the supposition. For Table 12-3, F = 2.18, F etit = 1.88 at 5% = 2.41 at 1%; accept the supposition at the former level; reject at the latter. For Table 12-4, F = 0.0436, ^crit = 0.232; reject the supposition even at 1% level. 22. Five samples of a bronze, analyzed for percent zinc by each of two different procedures gave the following results: Sample no. Method I Method II 1 9.14 9.18 2 9.01 9.46 3 9.58 9.60 4 9.38 9.67 5 10.06 9.92 Examine the data for the probabilities that the methods are equivalent and that the samples have the same percent zinc. Answer: q 2 /f F F 95 F 99 Method differences 0.0436 1.58 7.71 21.20 Sample differences 0.2173 7.87 6.39 15.98 Error 0.0276 204 INTRODUCTION TO STATISTICAL ANALYSIS The methods appear to be equivalent; the assumption that the samples have the same percent zinc is rejected at the 95% level but accepted at the 99% level. 23. Show how to construct a correlation coefficient table from Table C by evaluating r to 3 decimal places for 3, 5, and 10 observations at the 95% and 99% rejection levels. Answer: f 95 99 1 0.997 1.000 3 0.878 0.959 8 0.632 0.765 24. The tensile strength in 100 lb/in 2 and the hardness on the Rockwell E scale were measured for each of ten samples of die-cast aluminum, with the following results : Tensile strength 377 247 348 298 287 292 345 380 257 258 Hardness 70 56 86 60 72 51 88 95 51 75 Find r from Eq. (12-77), * from Eq. (12-89), and compare with Eq. (12-92). How likely is it that the hypothesis of no correlation is correct? Answer: r = 0.708, t = 2.83. There is a chance of only slightly more than two times in 100 that the observed correlation would occur when, in fact, the variables were uncorrelated. APPENDIXES AND REFERENCES APPENDIX 1 NORMALIZATION OF THE NORMAL DISTRIBUTION The integration of Eq. (8-10) is facilitated by a change of variable. Let t = hx. Then 1 = f (k/h)e~ t2 dt = (2k /h) [ X e-^dt [ (k/h)e~ t2 dt = (2k /h) f e J — 00 JO In order to evaluate this integral, let us consider R I = \ e~ l dt, Jo where R is a large number which will later be allowed to approach infinity. Since this is a definite integral, it will have the same numerical value if any other variable is substituted for t. Therefore one can also write R ..2 Then /= [\-* 2 dx= re- y2 dy Jo Jo R r R = [ K e~* 2 dx f K e-" 2 dy= r r e~^ 2 dxdy Jo Jo Jo Jo To verify the last step, we need only perform the integration indicated by the double integral on the right-hand side, and since the variables are separable, the expression will break down into the product of the two integrals. The double integral can be represented geometrically as the volume under a surface of revolution obtained by rotating a normal dis- tribution function about the z-axis as shown in Fig. Al-1. The volume represented by this integral is in fact that portion of space in the first octant between the surface of revolution, the zy-plane, the ?/2-plane and the zrc-plane, and is cut off at x = R and y = R, as shown in the figure. If R is large, we can determine the same volume approximately by means of the following integral: /■tt/2 rR r 2 , , M = / re~ r drdd, Jo Jo 207 208 APPENDIX 1 FIGURE Al-1 which gives that part of the volume bounded by the surface of revolution, the xy-, yz-, and zz-planes, and the circle r = R. This integral is always less than I 2 . If the integration over r is carried out between the limits and RV2, we can see from the figure that the volume included will always be greater than I 2 . That is, if tt/2 r R l^ 2 N= / Jo Jo then M < I 2 < N. But re T drdd. r R „, IT / -r 2 i V n „— R 2 1 2 / 6 r rfr = 4 t 1 — e ^ which approaches 7r/4 when 22 — ► oo . Since we can similarly show that N approaches 7r/4 when R — > oo, we find that Z 2 = 7r/4 and „-« 2 d« = \/x APPENDIX 2 EVALUATION OF THE STANDARD DEVIATION FOR THE NORMAL DISTRIBUTION It was shown in Appendix 1 that e~ 1 ' dt = •o 2 If one sets t = hx so that dt — h dx, then Jo" dx = ^h~ While h is constant for the integration, the result must be true for any value of h. Thus each side of the equation can be differentiated with respect to h to yield Hence / -2hx 2 e- h2x2 dx = -^ f Jo e x hx 2 dx = -77-x- 4/i 2 This device, by which the value of a difficult definite integral can be derived from a known one, will be used extensively in Appendix 6. 209 APPENDIX 3 SUMS OF POWERS OF INTEGERS; SOURCE OF MATHEMATICAL TOOLS A. The sums of powers of the first n integers can be found by the following procedure : J2 [(* + l) 2 - k 2 } = [l 2 - 2 ] + [2 2 - l 2 ] + • • • fc=0 + [n 2 - (n - l) 2 ] + [(n + l) 2 - n 2 ] = (n + l) 2 . Therefore, by expanding a term on the left, we obtain n n £(2fc + l) = 2X;H(n+l) k=o fc=o n = 2^H(«+1)=(« + l) 2 , fc=l from which we get n Y, * = hn{n + 1). fc=i Similarly, 2 [ft + D 3 - /c 3 ] = 3 X) fc 2 + 3 £ fc + (n + 1) = (n + l) 3 . Knowing ]Lk=i ^> we find that Xfc 2 = *n(n+l)(2n + 1). fc=i By successively increasing the powers of (fc + 1) and of fc on the left- 210 APPENDIX 3 211 hand side of the starting equation, we can find £ k* = K(* + 1) 2 . fc=i n 2 & 4 = sWn + l)(2n + l)(3n 2 + 3n - 1), etc. B. The sum of the squares of the odd integers up to 2r — 1 can be found by subtracting the sum of the squares of the even integers up to 2r — 2 from the sum of the squares of the first 2r — 1 integers. Note that the numbers 2, 4, 6, 8, . . . are twice 1, 2, 3, 4, . . . , so that (2 2 + 4 2 + 6 2 + • . • •) = 4(1 2 + 2 2 + 3 2 + • • •)• C. The reader is referred to Handbook of Chemistry and Physics (Cleveland: Chemical Rubber), especially the newer editions, as an excellent source of numerical tables, integrals, differentials, general mathematical and statistical formulas, and approximations to various mathematical functions. APPENDIX 4 WEIGHTS FOR QUANTITIES THAT HAVE BEEN EVALUATED BY THE METHOD OF LEAST SQUARES From the definition of t» following Eq. (11-16) we see that [ aT ] = Pdwaa] + (3 2 [wab] H + Pq[waq], [br] = ^[wab] + p 2 [wbb] H + P Q [wbq], [ qT ] = pdwaq] + fi 2 [wbq] -\ + P q [wqq]. (A4-1) Although the numerators of the j8's were defined as the minors, with appropriate algebraic sign, of the terms in the second column of the determinant which appears explicitly in Eq. (11-12), they are also seen to be the minors, with appropriate sign, of the terms in the second column of the determinant for A, shown in Eq. (11-13). The denominators of [ar], [&t], etc. are A; if the numerators are written as determinants, it is seen that all of them except that for [&t] have two identical columns and hence are zero [12]. The numerator of [br], however, is seen to be A so that the left-hand sides of all the equations (A4-1) are zero except the second, which is one. It can now be shown that 2 = [tt/w]. The summation [tt/iv] must be calculated from the values of the r's which were given after Eq. (11-16). First, the expressions for n, r 2 , . . . are squared: T \ = (foi^ai) 2 + (foWi&i) 2 + 03 3 w lCl ) 2 -\ r- PqWiQi) 2 + 2/3i/3 2 wfai&i + 2 l 3 1 /33W 2 aiCi -\ > rl = (/3 lW2 a 2 ) 2 + (/3 2 w 2 & 2 ) 2 + {fow 2 c 2 f -\ h ((3 q w 2 q 2 ) 2 + 2/3!/3 2 w;|a 2 b 2 + 2p 1 p 3 w 2 2 a 2 c 2 -\ » etc. (A4-2) Next, the first of these equations is divided by w u the second by w 2 , and 212 APPENDIX 4 213 so on, and the results are added to give TT w = Pliwaa] + fil[wbb] + pl[wcc] -\ + $[wqq] + 2/3 1 /3 2 [wa&] + 2 j 8 1 /3 3 [«;ac] -\ Note that the first term is the sum of all the first terms of Eqs. (A4-2), the second term is the sum of all the second terms, etc. This equation can be rearranged in the following form TT w = PiiPdwaa] -f- p 2 [wab] H + P q [waq]} + 2 {0i[wab] + Mwbb] + • • • + P q [i»bq]} + faiPAwac] + fa[wbc] + h P Q [wcq)} + • • • + P q {Pi[waq] + fa[wbq] H \- P q [wqq]} . But our examination of Eqs. (A4-1) has shown that the first line of the expression on the right-hand side is zero, the second line is fo X 1, and the third and all subsequent lines are equal to zero. The entire expression on the right-hand side then reduces to /3 2 , so that [tt/iv] = /3 2 . APPENDIX 5 A DEFINITE INTEGRAL RELATED TO THE SPREAD BETWEEN PAIRS OF OBSERVATIONS Let /OO — oo Since the first power of u appears in the integral, we cannot use the device of letting this integral be twice the integral from to go . We must begin by integrating by parts with U = e~ u \ dV = e' uw du. Then 00 /•«> -r/ \ 1 —(U 2 +UW) I(w) = e W J —, where the integrated part is zero. We next use the device of Appendix 2 and note that /oo ue -(u*+u W) du -00 Thus 2 dl l{w) = the solution of which is I(w) = - — , v w dw I(i V ) = Ce w2l \ where In C is the constant of integration. To evaluate C, we note that -u 2 du /oo —01 which, according to Appendix 1, is y/r. Thus I(w) = V7re 214 APPENDIX 6 CERTAIN DEFINITE INTEGRALS The methods used in Appendix 2 can be extended to the general case. Consider / Jo — au j -L e au = - If both sides of this equation are successively differentiated with respect to a, one obtains / Jo — au 7 *■ ue au = — ^ > /. 2 —au j 1 ' Z u e au o a* 3 —au j 1 • 2 • 6 u e au = — o « 4 It seems clear that if this process were carried out some indefinite number of times, one would find that / u j e~ au du = 4±r Jo a 3+1 This result is useful provided that j is an integer, but we are just as likely to be faced with cases where j is half-integral. To examine the latter, let k be an integer, j = k — %, and u = v 2 . Then /■OO /*00 / u i e— u du-+2 / v 2k e~ av2 dv. Jo Jo Setting h 2 = a, we obtain from the second equation of Appendix 2, 215 216 APPENDIX 6 By again using the device of successive differentiation, we get 2 /">,-.' <fo= v^-l-3-5---(2* : -l) 2*a (2fe+1)/2 Thus when j is half-integral, Jo u 3 e- au du= (*)(*)(*) • • • (j) a j+1 which, we see, bears a certain resemblance to the result for integral j. The leading factors are j(j — 1)0' — 2) • • • (|), which is like j\ except that all the factors are half-integral. It would be convenient to have one notation to describe both these cases ; such a notation in fact exists, and it is called the gamma function. The complete discussion of this function certainly is not a proper topic for this book, but those properties of it which are needed for the present application can be described easily. The gamma function of a number n is written Y(n). For any positive n, r(n + 1) = nT(n). Furthermore, _ r(i) = i, r(i) = V*. Thus if n is an integer, T(w + 1) = nT(n) = n(n — l)T(n — 1) = n(n - l)(n - 2)T(n - 2), etc., so that T(n + 1) = n\. If n is half-integral, then T(n + 1) = n(n - l)T(n — 1) = n{n — l)(n — 2)V(n - 2) = n(n - l)(n - 2) • • • (*)r(i) = n(n - 1) • ■ • (|)(v^). Thus for j either integral or half-integral Jo a 3+1 and the various normalizing factors for the distributions of Chapter 12 which are related to the X 2 distribution can be described easily in terms of the gamma function whether the number of degrees of freedom is even or odd. APPENDIX 7 MULTIPLE INTEGRATION: JACOBIANS Suppose that one wished to use the rate of working to determine the total work done by gravity on a falling body when the body falls from a height h to ground level h = 0. The rate of working, which is the product of force and velocity, is (—mg)v so that the desired result would be W = — I mgv dt, where t is the time required to fall the distance h. As an introduction to the subject of this appendix, we choose to work this problem in the following way. Instead of using t as an integration variable, we will use the distance above ground. This is given by from which we obtain h - \gt 2 , -(h-s) 1/2 (A7-1) = „ = - g t = -[2g(h - s)} 112 . (A7-2) and ds _., m„/i. „m1/2 di It is the use of Eqs. (A7-1) and (A7-2) which constitutes the main point of this example. Having them enables us to write o W= -mg J v(s)f s ds; i.e., we replace v by an expression involving a new variable of integration, multiply it by the derivative of the old variable with respect to the new variable, and append the differential of the latter. The limits of integration 217 218 APPENDIX 7 are, of course, changed appropriately. To complete this example: 1 dt _ ds W = mg [2g(k f JO s)]l/2' [2g(h - s)] 1 ' 2 [2g(h - «)]i/a ds = mgh. Now suppose that the function to be integrated is one of two variables. The problem has suddenly become much more complex, so much more so, in fact, that proofs and the general discussions for any number of variables are far beyond the scope of this book. On the other hand, procedures for developing and using the results of such general discussion can be described in an illustrative and suggestive way. Consider F = J" J/fo V) dxdy, and suppose that for some reason, we wish to carry out the integration over plane polar coordinates rather than over the plane cartesian co- ordinates. The two equations which correspond to the single Eq. (A7-1) of the previous illustration are x = r cos 0, V r sin 0. Note that the old variables are written in terms of the new. y FIGURE A7-1 It is sufficient for our present purposes to say that partial differentiation is the differentiation of a function with respect to one designated variable, while the other variables are held constant. Thus in our case, dz/dr = cos 0, and dx/dd = — r sin 6. If the reader likes, he may consider these as slopes. They tell us how fast x changes per unit change in r at a constant value of 6, and vice versa. Hence for small changes dr and dd we obtain the following changes dx in x and dy in y, to the first order of small quantities: dx f *+!<"> dy %* + %»■ dr dd APPENDIX 7 219 Solution of these by the method of determinants yields dr = J — 1 , dx dx ae dy dy dd dd = J -l dx , — dx dr dy dr dy where the Jacobian J(x,y; r, d), also designated by d(x, y) dir, d) is dx dx dr dd dy dy dr dd J(r, 0) or J = For the coordinate change of our example, J=r, dr = cos d dx + sin 6 dy, dd = r -1 (cos 6 dy — sin d dx) ; the last equation of (A7-3) is better written as r dd = cos 6 dy — sin 6 dx. (A7-3) (A7-4) Let us refer now to Fig. A7-1, where the short dashed lines are there only as visual aids. Points Pi and P 2 are two arbitrary points separated by some dx in the ^-direction and dy in the y-direction. These two direc- tions are perpendicular to each other, and the product dxdy is a proper differential area. It is seen that the dr and r dd shown in the figure are, to the first order of small quantities, just those given by Eqs. (A7-3) and (A7-4). To the same order they are mutually perpendicular, and their product is a proper differential area. While the two areas dxdy and r dddr are not equal — and there is no reason why they should be so long as the mutual perpendicularity between the distance elements of a given pair is maintained — the distance between Pi and P 2 is maintained, as it should be. That is, if the reader likes algebra, he can show that [(dr) 2 + (rdd) 2i 1/2 [(dx) 2 + (dy) 2-11/2 Thus it turned out that dxdy is to be replaced by J dddr, and, of course, the x and y in the original integrand are to be replaced by r cos d and r sin d, respectively. 220 APPENDIX 7 Let us mention one more similar example. This very important one is the change from three-dimensional cartesian coordinates to spherical coordinates, illustrated in Fig. A7-2. Here x = r sin cos 0, y = r sin sin <f>, z = r cos 9, dx dx dx dr 86 d4> J Thus dy dy dy dr dd d<f> dz dz dz dr 86 d^ r sin 6. figure A7-2 dxdydz — * r 2 sin 6 drddd<f> = (dr)(r dd)(r sin d<£). As with the previous example, it could be found that [(dr) 2 + (rdd) 2 + (r sin dd<t>) 2 ] 112 = [(dx) 2 + (dy) 2 + (^) 2 ] 1/2 . APPENDIX 8 CERTAIN DEFINITE INTEGRALS OF TRIGONOMETRIC FUNCTIONS For the sake of convenience, we will write fx = m, f v = n, f = m + n in order to discuss the evaluation of -x/2 f Jo (cos d)™- 1 (sin 6) n ~ l dd. (A8-1) In any good integral table, such as the one referred to in Appendix 3, we can find that / ( «- tf — i Mr* fl^-i ^ (cos d) m ~ 2 (sin 0)" (cos 6) (sin 0) d0 = m + n _ 2 m + n — 2 W — 2 / / _ /i\w-3 / •„ o\n-l (coser^Csiner- 1 ^. For limiting values of and 7r/2 the integrated part is zero. This expression can be applied successively to (A8-1), but the results will be different depending on whether m and n are even or odd integers. If m is even, the result is r-Tf/2 (m - 2)(» - 4)-.. (4)(2) __ r- cose(siD er-Ue. Jo (/- 2)(/-4)---(n + 4)(n + 2)7 The numerator and denominator of the coefficient of the integral each has (m — 2)/2 factors. The integral is easily evaluated; it is / (sin 0)— 1 d(sin 6) = i- Jo n Thus the result can be written as Km-2)|(m-4)---(2)(l) 2 *(/ - 2)i(/ - 4) • • • i(n + 4)i(n + 2)*n 221 222 APPENDIX 8 From the discussion of the gamma function in Appendix 6, we see that we can write the numerator as T(m/2). Furthermore, regardless of whether n is even or odd, we can multiply the numerator and denominator by r(n/2), the first factor of which is (n — 2)/2. Each factor in the denomina- tor after the first is one less than the one preceding it; to multiply by T(n/2) continues this progression past w/2. If n is even, then so is /; if n is odd, then / is also odd. In either case, the final result is /;' 2 (cos «-> (sin er-ue = \ «p • We shall not repeat the argument for the case of odd m. If the reader cares to, he can verify that the result given above is good for any com- bination of odd or even m and n. APPENDIX 9 SOLUTION OF SIMULTANEOUS EQUATIONS The following method for solving simultaneous equations is an extension of the process of successive elimination of the unknowns, which is used in the algebraic solution of small numbers of equations. Thus it can be applied to any set of equations that has a solution. Here we will designate the unknowns by x t and write the equations as: OllXi + 012^2 + • • • + o ln x n = 2i, «21^1 + «22^2 + * * " + &2n%n = *2) Gnl^l + CLn2 x 2 + • • • + a nn X n = Z n . Numerical accuracy is best maintained if the unknowns are eliminated in the order of increasing absolute magnitude. It is supposed that the equations have been written so that, to the best of one's knowledge, Nil < M < < M> and the method will be described so that x x is eliminated first, x 2 is eliminated second, and so on. If one has a set of equations for which it is judged that reordering is necessary to satisfy this requirement, the reordering need only be done within each equation for general use of the method. Whenever the equations possess the symmetry ex- hibited by least-squares normal equations when they are set down in the order in which one would naturally arrange them, so that a*y = ay*, and reordering seems desirable, then the equations should also be reordered so that this symmetry is kept. It will be seen that the existence of this symmetry reduces the labor of calculation. We next set up the pattern shown in Table A9-1, noting as we do so that o,j has been put into the 0*, i) position. The equation in the upper right-hand corner of this pattern means that x x is to be eliminated from the equations; we also note that this equation will be used for evaluating Xi after all the other Xi have been found. 223 224 APPENDIX 9 Table A9-1 Z\ Z 2 Z3 • z n 1 Xi = — (*i- «12^2 — • * Qln%n) ' \ an a2i a%\ • • 0>n\ _ 021 an _ a-31 Oil ^41 «11 Onl «11 A* d\2 a 22 #32 " ' <Ln2 1 ••• B a 13 a 23 «33 " " • «n3 1 ,ai„ a2n O-Zn ' • a nn ••• 1 , After ^i is eliminated we will have a new set of (n — 1) equations in (n — 1) unknowns. Thus we need to show the elimination process only once; the same process can be applied for any value of n. We will designate the new values of z with primes and use b { j for the new coefficients of the remaining x's. The values of z[ are found by summing products of the terms across the row of z t and down a column in part B; we use subscripts in such a way as to show that it is x x that has been eliminated. Thus z' 2 = Zi *3 2 4 = Z (" S) + Z2(1) + 23(0) + ■(-£) (— tt2i \ Zl - + 22, — ) + *4, 2n = ^1 I — The values of bij are found in a similar way except that the entries in rows of A, starting with the second row, are used instead of the values of z*. For example, . &22 = Ol2 ( — -^ ) + «22, b n 2 = «i2 ( — ~ ) + a n2 , &33 = Ol3 ( — ^ J + «33, 7 _ /_ Oni\ "nn — 0,in I I -)- a nw . V «ii/ APPENDIX 9 225 Then the new pattern is: 22 2=3 * • z' ^2 = _ J_ 022 (4 — 023^3 • • • U2n x n) &22 &32 ' • 0n2 — 032 022 042 022 _ 6n2 &22 &23 033 * • o w3 1 ■•• &2n hzn- Onn •• 1 In order to make the ending of this process clear we now set down the complete solution for three equations in three unknowns: Z\ z 2 zz Xi = {z X — ai2^2 — «13^3) an an «21 «31 a2i Q31 an an «12 «22 032 1 «13 «23 «33 1 *2 23 ^2 = 7— («2 — 023^3) 022 022 032 _ ^32 022 023 033 1 23 C33 C33 z 2 - = z 2 ~ -fe)' / A^A 022 = : 022 /02A " ai2 W ' 1. /^A 023 — a 23 ~ «13 I — I ' 032 = : «32 /03A " a ' 2 W ' 7. /^ a sA 033 — a 33 — «13 I — J ' V — %3 - - 23 - -«G 32V 22/ C33 = 033 - 023 (^j • After having determined x 3 we now go back up the equations included in the pattern to determine successively x 2 and x x . 226 APPENDIX 9 We can now see the reason for ordering the normal equations in the manner described above and in the text. If the unknowns are reordered in each equation and the equations are not also reordered, the symmetry- will be lost. It can be seen that if a iy = a ji} then b t j = b ji} and so on; the labor of calculation is reduced accordingly. REFERENCES 1. F. Mosteller, R. E. K. Rourke, and G. B. Thomas, Jr., Probability with Statistical Applications. Reading, Mass.: Addison- Wesley, 1961. 2. J. C. Fry, Probability and Its Engineering Uses. Princeton, N.J.: van Nostrand, 1928. 3. P. G. Hoel, Introduction to Mathematical Statistics. New York: Wiley, 1962, p. 68. 4. R. T. Birge, Phys. Rev., 40, 207 (1932). 5. A. G. Worthing and J. Geffner, Treatment of Experimental Data. New York: Wiley, 1944, p. 148. 6. D. P. Bartlett, Method of Least Squares. Cambridge, Mass.: Harvard Cooperative Society, 1933, p. 11. 7. J. E. Mayer and M. G. Mayer, Statistical Mechanics. New York: Wiley, 1940, pp. 431-433. This text includes several useful mathematical appendixes. 8. N. Arley and K. R. Buch, Introduction to the Theory of Probability and Statistics. New York: Wiley, 1950. 9. B. W. Lindgren, Statistical Theory. New York: Macmillan, 1962. 10. R. T. Birge, Rev. Mod. Phys., 19, 298 (1947). 11. J. Topping, Errors of Observation and Their Treatment. London: Institute of Physics, 1955. 12. For a discussion of the properties of determinants see any good college algebra text. See, for instance, J. B. Rosenbach and E. A. Whitman, College Algebra. Boston: Ginn, 1933, pp. 304f. 13. W. J. Youden, Statistical Methods for Chemists. New York: Wiley, 1951. 14. J. V. Uspensky, Introduction to Mathematical Probability. New York: McGraw-Hill, 1937, p. 339. Note that the meanings of p and r which we have used are interchanged in this text. We have used p in the same way as Arley and Buch [8], p. 44. 227 TABLES 230 TABLES >> ^ I* 8 > OS co 00 iO CO 7535 *1409 5173 8793 *2240 5490 8524 *1327 3891 CO OS Tt< !>. <N i-i t~- 00 O i-i OS 00 OS 00 o ^ CO -"t 1 IO l^ <N O NtON CO O CO CD t^ I> OS CD T-H 00 •"* OS CO t^ OS iO 1O00H 00 00 OS 00 00 00 1— 1 CO 7142 *1026 4803 Os •* iO WON ■* OS i-H 00 H IO * 8230 *1057 3646 CO OS Os IO O CO i-H O 1> <N i-H OS CO 00 OS H (OWN IO OS iO O <N CO CO ■* iO CO iO IO ■*OrH <N OS CO CO CO J> <N T-H 00 b- o ■* co t^ co IO 00 i-H 00 00 OS l> o os <N 6749 *0642 4431 8082 *1566 4857 IO iO 00 CO 00 OS OS b- CO NOM * OS CO IO O CO CO O OS CO OS t^ ■* 1> OS i-H <N OS ■># MMO OS i-l (N <N ■* iO ■* CD 00 CO <N iO i-h os io co co lr^ o 00 8500 8840 9111 CO OS CO 6356 *0257 4058 ■*CDN <N <N CO NN IO 1> -H TjH * I s - l-H t>. CO 1-1 Tf CO >C i— i NOCO * CO IO »o 00 l> OS OHO CD CO CO 1> OS i-H CO <N -^ 00 CO iO b- O I-H <N -* iO O CD O 00 iO O O 00 iO CD CD t^> o CO o 00 t-H OS CO CO o 00 Tt< 00 O 00 00 OS IO OS OS r-i NHCO CO t^ 00 OS 00 CO IO OS CO 7364 *0884 4215 I> ■* tH co co os CO (N 00 t^ O <N * 1— 1 CO IO eo io os OS CO ■«* Tt< Tt< T-H 1> OS i-H I> CO CO ■* ■># io CO OS O NCCiO ■* Tt< H OS 00 ■* OS t^ •* lOCN <N 00 OS 1> (NN© ■* t~ o 00 00 OS "tf iO OS U5 i— 1 r- co i> CO 00 o »0 "# CO IO OS CO CO O i-H O ^ OS O IO 00 * iCiCffl co «o co O OS CD b- OS <N CO 00 o IO «h« 00 IO 00 <N IN OS NOiO 1^ <N O O CM iO »o 00 OS CN CO -tf t> <N i-H OrtOO OS l> CO ICON CO OS <N iO CO 00 •* CO CO w o 00 00 OS CO OS i— i NUJO ]>• os co r-i O O) IO OS <N O TfH lO ■^ os CO co i— i »o co o co * O CO i— I co r~ oo i> co co OOSN o IO 00 CO IO ■* t- CO (N O O 00 NOJO ■<* OS iO CO OS •* CO CO 00 00 00 O i-H CO <N 00 CD CO lOfflN <N 00 00 1> 8341 8713 9010 <N 00 OS o CO CO (N NOlO t^ t^ iO •* 00 <N CO !> t> b- TjH CO <N 00 <N CO OS CO 6424 9389 2121 i-H co •* f^ 00 CO 1> iO 00 00 CD CO 00 O O Th 00 NNM <N CO -* 00 <N l> <N CD IO t^ IO <N lOCON 1— 1 CO 00 1> O OS CO ON00 CO CO OS 00 00 00 r-i OS OS CO O 4380 8317 2172 O b- l> i-H OS O os ^ os "3 OS <N IO CO OS T-H O >0 i-i i-i 00 CO OS i-i IO co o co o IO 00 OS CO CO ■* CO 00 o CO 00 O l> ■* CO O <* CO l> IO CO CO 00 OS CD •* H UJON 00 *•- IO CO IO ■* »o <N CD OS 00 00 00 o o o o o q o 3983 0.07926 0.11791 5542 0.19146 0.22575 5804 0.28814 0.31594 CO co d 6433 0.38493 0.40320 1924 3319 4520 CO t^ 00 ■* O <N IO ■«* l-H lOON IO <N d ■* O 00 1-H T-H CN eq co os 00 00 00 b II o o HNM © d d "* »o CO odd t^ 00 OS odd q i-H O* CO ■* IO CO !>. 00 OS o <N HNM ci c<i <N TABLES 231 HOM «ONH CO O CO o o 0)0X5 CM »0 CO CD CO Oi t>- 00 00 CM "5 t- Oi Oi Oi 00 Oi Oi Oi Oi Oi Oi Oi 0000 0000 0000* 0000 0000 0000 o o o o CO "5 CO Oi Oi o> b- 00 00 Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi COON •^ O CO 00 "-I CO NOW Oi CO 00 ■<* tM Tt< CO IOM00 t^ 00 00 CM HO b- Oi Oi Oi 00 Oi Oi Oi Oi Oi Oi Oi OO O ooo ooo ooo ooo ooo o o o o co »o co O G Oi t^ 00 00 Oi Oi Oi 00 Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi o o o • ■* CM »-l CO -* CO tM •* CM 00 CM lO CO 00 00 Oi Oi o o o ooo ooo ooo o o o o CM Oi CM NOiiO Oi CM ■* CO 1^ 00 00 Oi Oi Oi Oi Oi Oi Oi 888 t-» l> 00 00 Oi Oi Oi Oi Oi Oi Oi Oi oi a Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi o o o * Oi CO H N CM ■* co 00 00 Oi Oi o oo ooo ooo o HOOTf 00 CM ■<* CO t^ 00 00 Oi Oi Oi Oi Oi Oi Oi o o I> t^ 00 00 Oi Oi Oi Oi Oi Oi Oi o o o Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi ooo * CM 1-H r-l CO 00 CM O CM »H t— y-t TH CO t^ 00 Oi Oi ooo ooo o O 00 ■* 00 y-t ■* CO t^ 00 00 Oi Oi Oi Oi Oi Oi Oi ooo ooo CM TfH lO t- l> 00 00 Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi ooo o Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi ooo * CO ^t 1 CO CM CO O 00 H O CO i-H TH CO t^ 00 Oi Oi ooo ooo o CO ■«*< 00 00 rt ^lO 1> 00 00 Oi Oi Oi Oi Oi Oi Oi ooo ooo CM Tt< W CO !>■ 00 00 Oi Oi Oi Oi Oi Oi Oi ooo ooo o Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi ooo * ooo o COM CCNH 00 CO 00 t- O Oi CO o ■>* co t^ 00 Oi Oi ooo ooo o ■* MN 00 CO CO t^ HCO lO b- b- 00 Oi Oi Oi Oi Oi Oi Oi ooo 03 •* lO CO 1> 00 00 Oi Oi Oi Oi Oi Oi Oi ooo ooo Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi ooo * ooo o "* CO o ^OKJ ■* OffliO Oi 00 «3 O CO CO t^ 00 Oi Oi ooo ooo o M i-HO 1> CO CM t^ CO t- 00 Oi Oi Oi Oi Oi Oi Oi ooo ooo CM tJ< to CO t^ 00 00 Oi Oi Oi Oi Oi Oi Oi ooo ooo o Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi ooo » ooo o CM CO t-- -^ <N Oi Oi CO Tf CO oo oo m O CO lO t^ 00 Oi Oi 05 2 2 ooo o O Oi "* CO lO i— I CO o co io CO t^ 00 Oi Oi Oi Oi Oi Oi Oi Oi o o ooo o (N CO kO CO 1>- 00 00 Oi Oi Oi Oi Oi Oi Oi Oi o o ooo o Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi o o ooo o O Oi "* co ■* co iO CO r-H tM CO b- TjH Oi CO m 1> 00 Oi Oi Oi O O ooo o 00 l> CO lO "<*l i— I CO omio CO l> 00 00 Oi Oi Oi Oi Oi Oi Oi o o ooo o H fQ lO CO I> 00 00 Oi Oi Oi Oi Oi Oi Oi Oi o o ooo o Oi Oi Oi Oi Oi Oi Oi d Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi © Oi Oi Oi Oi o o d d ooo o d •<* lO CO b- 00 Oi o i-h CM CO ■* lO CO l> 00 Oi o -H <N co ■* m co l^ 00 Oi o <N CM CM CM (M CM co co co co CO CO CO co co CO -* -tf TJ4 ■* id o U bC 03 JO Ph PQ JO Oh >> JO £ 232 TABLES Table B X 2 -VALUES Degrees of free- P = 0.99 0.98 0.95 0.90 0.80 0.70 dom 1 0.000157 0.000628 0.00393 0.0158 0.0642 0.148 2 0.0201 0.0404 0.103 0.211 0.446 0.713 3 0.115 0.185 0.352 0.584 1.005 1.424 4 0.297 0.429 0.711 1.064 1.649 2.195 5 0.554 0.752 1.145 1.610 2.343 3.000 6 0.872 1.134 1.635 2.204 3.070 3.828 7 1.239 1.564 2.167 2.833 3.822 4.671 8 1.646 2.032 2.733 3.490 4.594 5.527 9 2.088 2.532 3.325 4.168 5.380 6.393 10 2.558 3.059 3.940 4.865 6.179 7.267 11 3.053 3.609 4.575 5.578 6.989 8.148 12 3.571 4.178 5.226 6.304 7.807 9.034 13 4.107 4.765 5.892 7.042 8.634 9.926 14 4.660 5.368 6.571 7.790 9.467 10.821 15 5.229 5.985 7.261 8.547 10.307 11.721 16 5.812 6.614 7.962 9.312 11.152 12.624 17 6.408 7.255 8.672 10.085 12.002 13.531 18 7.015 7.906 9.390 10.865 12.857 14.440 19 7.633 8.567 10.117 11.651 13.716 15.352 20 8.260 9.237 10.851 12.443 14.578 16.266 21 8.897 9.915 11.591 13.240 15.445 17.182 22 9.542 10.600 12.338 14.041 16.314 18.101 23 10.196 11.293 13.091 14.848 17.187 19.021 24 10.856 11.992 13.848 15.659 18.062 19.943 25 11.524 12.697 14.611 16.473 18.940 20.867 26 12.198 13.409 15.379 17.292 19.820 21.792 27 12.879 14.125 16.151 18.114 20.703 22.719 28 13.565 14.847 16.928 18.939 21.588 23.647 29 14.256 15.574 17.708 19.768 22.475 24.577 30 14.953 16.306 18.493 20.599 23.364 25.508 For degrees of freedom greater than 30, the expression V2x 2 — V2n' — 1 may be used as a normal deviate with unit variance, where n' is the number of degrees of freedom. (This Table is taken from Table III of R. A. Fisher, Statistical Methods for Research Workers, 12th ed., 1954, published by Oliver & Boyd Ltd., Edinburgh, by permission of the author and publishers.) TABLES 233 Degrees of free- 0.50 0.30 0.20 0.10 0.05 0.02 0.01 dom 1 0.455 1.074 1.642 2.706 3.841 5.412 6.635 2 1.386 2.408 3.219 4.605 5.991 7.824 9.210 3 2.366 3.665 4.642 6.251 7.815 9.837 11.341 4 3.357 4.878 5.989 7.779 9.488 11.668 13.277 5 4.351 6.064 7.289 9.236 11.070 13.388 15.086 6 5.348 7.231 8.558 10.645 12.592 15.033 16.812 7 6.346 8.383 9.803 12.017 14.067 16.622 18.475 8 7.344 9.524 11.030 13.362 15.507 18.168 20.090 9 8.343 10.656 12.242 14.684 16.919 19.679 21.666 10 9.342 11.781 13.442 15.987 18.307 21.161 23.209 11 10.341 12.899 14.631 17.275 19.675 22.618 24.725 12 11.340 14.011 15.812 18.549 21.026 24.054 26.217 13 12.340 15.119 16.985 19.812 22.362 25.472 27.688 14 13.339 16.222 18.151 21.064 23.685 26.873 29.141 15 14.339 17.322 19.311 22.307 24.996 28.259 30.578 16 15.338 18.418 20.465 23.542 26.296 29.633 32.000 17 16.338 19.511 21.615 24.769 27.587 30.995 33.409 18 17.338 20.601 22.760 25.989 28.869 32.346 34.805 19 18.338 21.689 23.900 27.204 30.144 33.687 36.191 20 19.337 22.775 25.038 28.412 31.410 35.020 37.566 21 20.337 23.858 26.171 29.615 32.671 36.343 38.932 22 21.337 24.939 27.301 30.813 33.924 37.659 40.289 23 22.337 26.018 28.429 32.007 35.172 38.968 41.638 24 23.337 27.096 29.553 33.196 36.415 40.270 42.980 25 24.337 28.172 30.675 34.382 37.652 41.566 44.314 26 25.336 29.246 31.795 35.563 38.885 42.856 45.642 27 26.336 30.319 32.912 36.741 40.113 44.140 46.963 28 27.336 31.391 34.027 37.916 41.337 45.419 48.278 29 28.336 32.461 35.139 39.087 42.557 46.693 49.588 30 29.336 33.530 36.250 40.256 43.773 47.962 50.892 TABLES Table C t TEST OF SIGNIFICANCE BETWEEN TWO SAMPLE MEANS (xi AND X 2 ) -t +/ Degrees of *P = 0.9 0.8 0.7 0.6 0.5 0.4 freedom 1 0.158 0.325 0.510 0.727 1.000 1.376 2 0.142 0.289 0.445 0.617 0.816 1.061 3 0.137 0.277 0.424 0.584 0.765 0.978 4 0.134 0.271 0.414 0.569 0.741 0.941 5 0.132 0.267 0.408 0.559 0.727 0.920 6 0.131 0.265 0.404 0.553 0.718 0.906 7 0.130 0.263 0.402 0.549 0.711 0.896 8 0.130 0.262 0.399 0.546 0.706 0.889 9 0.129 0.261 0.398 0.543 0.703 0.883 10 0.129 0.260 0.397 0.542 0.700 0.879 11 0.129 0.260 0.396 0.540 0.697 0.876 12 0.128 0.259 0.395 0.539 0.695 0.873 13 0.128 0.259 0.394 0.538 0.694 0.870 14 0.128 0.258 0.393 0.537 0.692 0.868 15 0.128 0.258 0.393 0.536 0.691 0.866 16 0.128 0.258 0.392 0.535 0.690 0.865 17 0.128 0.257 0.392 0.534 0.689 0.863 18 0.127 0.257 0.392 0.534 0.688 0.862 19 0.127 0.257 0.391 0.533 0.688 0.861 20 0.127 0.257 0.391 0.533 0.687 0.860 21 0.127 0.257 0.391 0.532 0.686 0.859 22 0.127 0.256 0.390 0.532 0.686 0.858 23 0.127 0.256 0.390 0.532 0.685 0.858 24 0.127 0.256 0.390 0.531 0.685 0.857 25 0.127 0.256 0.390 0.531 0.684 0.856 26 0.127 0.256 0.390 0.531 0.684 0.856 27 0.127 0.256 0.389 0.531 0.684 0.855 28 0.127 0.256 0.389 0.530 0.683 0.855 29 0.127 0.256 0.389 0.530 0.683 0.854 30 0.127 0.256 0.389 0.530 0.683 0.854 oo 0.12566 0.25335 0.38532 0.52440 0.67449 0.84162 * P is the probability of having t this large or larger in size by chance. (This Table is taken from Table IV of R. A. Fisher, Statistical Methods for Research Workers, 12th ed., 1954, published by Oliver & Boyd Ltd., Edinburgh, by permission of the author and publishers.) TABLES 235 Degrees of 0.3 0.2 0.1 0.05 0.02 0.01 freedom 1 1.963 3.078 6.314 12.706 31.821 63.657 2 1.386 1.886 2.920 4.303 6.965 9.925 3 1.250 1.638 2.353 3.182 4.541 5.841 4 1.190 1.533 2.132 2.776 3.747 4.604 5 1.156 1.476 2.015 2.571 3.365 5.032 6 1.134 1.440 1.943 2.447 3.143 3.707 7 1.119 1.415 1.895 2.365 2.998 3.499 8 1.108 1.397 1.860 2.306 2.896 3.355 9 1.100 1.383 1.833 2.262 2.821 3.250 10 1.093 1.372 1.812 2.228 2.764 3.169 11 1.088 1.363 1.796 2.201 2.718 3.106 12 1.083 1.356 1.782 2.179 2.681 3.055 13 1.079 1.350 1.771 2.160 2.650 3.012 14 1.076 1.345 1.761 2.145 2.624 2.977 15 1.074 1.341 1.753 2.131 2.602 2.947 16 1.071 1.337 1.746 2.120 2.583 2.921 17 1.069 1.333 1.740 2.110 2.567 2.898 18 1.067 1.330 1.734 2.101 2.552 2.878 19 1.066 1.328 1.729 2.093 2.539 2.861 20 1.064 1.325 1.725 2.086 2.528 2.845 21 1.063 1.323 1.721 2.080 2.518 2.831 22 1.061 1.321 1.717 2.074 2.508 2.819 23 1.060 1.319 1.714 2.069 2.500 2.807 24 1.059 1.318 1.711 2.064 2.492 2.797 25 1.058 1.316 1.708 2.060 2.485 2.787 26 1.058 1.315 1.706 2.056 2.479 2.779 27 1.057 1.314 1.703 2.052 2.473 2.771 28 1.056 1.313 1.701 2.048 2.467 2.763 29 1.055 1.311 1.699 1.045 2.462 2.756 30 1.055 1.310 1.697 2.042 2.457 2.750 00 1.03643 1.28155 1.64485 1.95996 2.32634 2.57582 236 TABLES Table D F TEST FOR EQUALITY OF VARIANCES Degrees of freedom for lesser mean square Degrees of freedom for greater mean square 1 2 3 4 5 6 7 8 9 10 11 1 161 200 216 225 230 234 237 239 241 242 243 4052 4999 5403 5625 5764 5859 5928 5981 6022 6056 6082 2 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 19.40 98.49 99.01 99.17 99.25 99.30 99.33 99.34 99.36 99.38 99.40 99.41 3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.76 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.13 4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.93 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54 14.45 5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.70 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27 10.15 10.05 9.96 6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.60 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 6.71 6.62 6.54 8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.31 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 5.91 5.82 5.74 9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 3.10 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47 5.35 5.26 5.18 10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97 2.94 10.04 7.56 6.55 5.99 5.64 5.39 5.21 5.06 4.95 4.85 4.78 11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.86 2.82 9.65 7.20 6.22 5.67 5.32 5.07 4.83 4.74 4.63 4.54 4.46 12 4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.80 2.76 2.72 9.33 6.93 5.95 5.41 5.06 4.82 4.65 4.50 4.39 4.30 4.22 13 4.67 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.72 2.67 2.63 9.07 6.70 5.74 5.20 4.86 4.62 4.44 4.30 4.19 4.10 4.02 14 4.60 3.74 3.34 3.11 2.96 2.85 2.77 2.70 2.65 2.60 2.56 8.86 6.51 5.56 5.03 4.69 4.46 4.28 4.14 4.03 3.94 3.86 15 4.54 3.68 3.29 3.06 2.90 2.79 2.70 2.64 2.59 2.55 2.51 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.73 16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.45 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.61 * This table gives values of F which one would expect to exceed by chance along 5% and 1% of the time. (Reprinted, by permission, from G. W. Snedecor, Statistical Methods, 5th ed., Iowa State University Press, 1956.) TABLES 237 Degrees of freedom for greater mean square 12 14 16 20 24 30 40 50 75 100 200 500 00 244 245 246 248 249 250 251 252 253 253 254 254 254 6106 6142 6169 6208 6234 6258 6286 6302 6323 6334 6352 6361 6366 19.41 19.42 19.43 19.44 19.45 19.46 19.47 19.47 19.48 19.49 19.49 19.50 19.50 99.42 99.43 99.44 99.45 99.46 99.47 99.48 99.48 99.49 99.49 99.49 99.50 99.50 8.74 8.71 8.69 8.66 8.64 8.62 8.60 8.58 8.57 8.56 8.54 8.54 8.53 27.05 26.92 26.83 26.69 26.60 26.50 26.41 26.35 26.27 26.23 26.18 26.14 26.12 5.91 5.87 5.84 5.80 5.77 5.74 5.71 5.70 5.68 5.66 5.65 5.64 5.63 14.37 14.24 14.15 14.02 13.93 13.83 13.74 13.69 13.61 13.57 13.52 13.48 13.46 4.68 4.64 4.60 4.56 4.53 4.50 4.46 4.44 4.42 4.40 4.38 4.37 4.36 9.89 9.77 9.68 9.55 9.47 9.38 9.29 9.24 9.17 9.13 9.07 9.04 9.02 4.00 3.96 3.92 3.87 3.84 3.81 3.77 3.75 3.72 3.71 3.69 3.68 3.67 7.72 7.60 7.52 7.39 7.31 7.23 7.14 7.09 7.02 6.99 6.94 6.90 6.88 3.57 3.52 3.49 3.44 3.41 3.38 3.34 3.32 3.29 3.28 3.25 3.24 3.23 6.47 6.35 6.27 6.15 6.07 5.98 5.90 5.85 5.78 5.75 5.70 5.67 5.65 3.28 3.23 3.20 3.15 3.12 3.08 3.05 3.03 3.00 2.98 2.96 2.94 2.93 5.67 5.56 5.48 5.36 5.28 5.20 5.11 5.06 5.00 4.96 4.91 4.88 4.86 3.07 3.02 2.98 2.93 2.90 2.86 2.82 2.80 2.77 2.76 2.73 2.72 2.71 5.11 5.00 4.92 4.80 4.73 4.64 4.56 4.51 4.45 4.41 4.36 4.33 4.31 2.91 2.86 2.82 2.77 2.74 2.70 2.67 2.64 2.61 2.59 2.56 2.55 2.54 4.71 4.60 4.52 4.41 4.33 4.25 4.17 4.12 4.05 4.01 3.96 3.93 3.91 2.79 2.74 2.70 2.65 2.61 2.57 2.53 2.50 2.47 2.45 2.42 2.41 2.40 4.40 4.29 4.21 4.10 4.02 3.94 3.86 3.80 3.74 3.70 3.66 3.62 3.60 2.69 2.64 2.60 2.54 2.50 2.46 2.42 2.40 2.36 2.35 2.32 2.31 2.30 4.16 4.05 3.98 3.86 3.78 3.70 3.61 3.56 3.49 3.46 3.41 3.38 3.36 2.60 2.55 2.51 2.46 2.42 2.38 2.34 2.32 2.28 2.26 2.24 2.22 2.21 3.96 3.85 3.78 3.67 3.59 3.51 3.42 3.37 3.30 3.27 3.21 3.18 3.16 2.53 2.48 2.44 2.39 2.35 2.31 2.27 2.24 2.21 2.19 2.16 2.14 2.13 3.80 3.70 3.62 3.51 3.43 3.34 3.26 3.21 3.14 3.11 3.06 3.02 3.00 2.48 2.43 2.39 2.33 2.29 2.25 2.21 2.18 2.15 2.12 2.10 2.08 2.07 3.67 3.56 3.48 3.36 3.29 3.20 3.12 3.07 3.00 2.97 2.92 2.89 2.87 2.42 2.37 2.33 2.28 2.24 2.20 2.16 2.13 2.09 2.07 2.04 2.02 2.01 3.55 3.45 3.37 3.25 3.18 3.10 3.01 2.96 2.89 2.86 2.80 2.77 2.75 238 TABLES Table D (continued) Degrees of freedom for lesser mean square Degrees of freedom for greater mean square 1 2 3 4 5 6 7 8 9 10 11 17 4.45 3.59 3.20 2.96 2.81 2.70 2.62 2.55 2.50 2.45 2.41 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.52 18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.37 8.28 6.01 5.09 4.58 4.25 4.01 3.85 3.71 3.60 3.51 3.44 19 4.38 3.52 3.13 2.90 2.74 2.63 2.55 2.48 2.43 2.38 2.34 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.36 20 4.35 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.35 2.31 8.10 5.85 4.94 4.43 4.10 3.87 3.71 3.56 3.45 3.37 3.30 21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.28 8.02 5.78 4.87 4.37 4.04 3.81 3.65 3.51 3.40 3.31 3.24 22 4.30 3.44 3.05 2.82 2.66 2.55 2.47 2.40 2.35 2.30 2.26 7.94 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.18 23 4.28 3.42 3.03 2.80 2.64 2.53 2.45 2.38 2.32 2.28 2.24 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.14 24 4.26 3.40 3.01 2.78 2.62 2.51 2.43 2.36 2.30 2.26 2.22 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.25 3.17 3.09 25 4.24 3.38 2.99 2.76 2.60 2.49 2.41 2.34 2.28 2.24 2.20 7.77 5.57 4.68 4.18 3.86 3.63 3.46 3.32 3.21 3.13 3.05 26 4.22 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.18 7.7,2 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.17 3.09 3.02 27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.30 2.25 2.20 2.16 7.68 5.49 4.60 4.11 3.79 3.56 3.39 3.26 3.14 3.06 2.98 28 4.20 3.34 2.95 2.71 2.56 2.44 2.36 2.29 2.24 2.19 2.15 7.64 5.45 4.57 4.07 3.76 3.53 3.36 3.23 3.11 3.03 2.95 29 4.18 3.33 2.93 2.70 2.54 2.43 2.35 2.28 2.22 2.18 2.14 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.08 3.00 2.92 30 4.17 3.32 2.92 2.69 2.53 2.42 2.34 2.27 2.21 2.16 2.12 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.06 2.98 2.90 32 4.15 3.30 2.90 2.67 2.51 2.40 2.32 2.25 2.19 2.14 2.10 7.50 5.34 4.46 3.97 3.66 3.42 3.25 3.12 3.01 2.94 2.86 34 4.13 3.28 2.88 2.65 2.49 2.38 2.30 2.23 2.17 2.12 2.08 7.44 5.29 4.42 3.93 3.61 3.38 3.21 3.08 2.97 2.89 2.82 36 4.11 3.26 2.86 2.63 2.48 2.36 2.28 2.21 2.15 2.10 2.06 7.39 5.25 4.38 3.89 3.58 3.35 3.18 3.04 2.94 2.86 2.78 38 4.10 3.25 2.85 2.62 2.46 2.35 2.26 2.19 2.14 2.09 2.05 7.35 5.21 4.34 3.86 3.54 3.32 3.15 3.02 2.91 2.82 2.75 40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.07 2.04 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.88 2.80 2.73 42 4.07 3.22 2.83 2.59 2.44 2.32 2.24 2.17 2.11 2.06 2.02 7.27 5.15 4.29 3.80 3.49 3.26 3.10 2.96 2.86 2.77 2.70 44 4.06 3.21 2.82 2.58 2.43 2.31 2.23 2.16 2.10 2.05 2.01 7.24 5.12 4.26 3.78 3.46 3.24 3.07 2.94 2.84 2.75 2.68 TABLES 239 Degrees of freedom for greater mean square 12 14 16 20 24 30 40 50 75 100 200 500 00 2.38 3.45 2.33 3.35 2.29 3.27 2.23 3.16 2.19 3.08 2.15 3.00 2.11 2.92 2.08 2.86 2.04 2.79 2.02 2.76 1.99 2.70 1.97 2.67 1.96 2.65 2.34 3.37 2.29 3.27 2.25 3.19 2.19 3.07 2.15 3.00 2.11 2.91 2.07 2.83 2.04 2.78 2.00 2.71 1.98 2.68 1.95 2.62 1.93 2.59 1.92 2.57 2.31 3.30 2.26 3.19 2.21 3.12 2.15 3.00 2.11 2.92 2.07 2.84 2.02 2.76 2.00 2.70 1.96 2.63 1.94 2.60 1.91 2.54 1.90 2.51 1.88 2.49 2.28 3.23 2.23 3.13 2.18 3.05 2.12 2.94 2.08 2.86 2.04 2.77 1.99 2.69 1.96 2.63 1.92 2.56 1.90 2.53 1.87 2.47 1.85 2.44 1.84 2.42 2.25 3.17 2.20 3.07 2.15 2.99 2.09 2.88 2.05 2.80 2.00 2.72 1.96 2.63 1.93 2.58 1.89 2.51 1.87 2.47 1.84 2.42 1.82 2.38 1.81 2.36 2.23 3.12 2.18 3.02 2.13 2.94 2.07 2.83 2.03 2.75 1.98 2.67 1.93 2.58 1.91 2.53 1.87 2.46 1.84 2.42 1.81 2.37 1.80 2.33 1.78 2.31 2.20 3.07 2.14 2.97 2.10 2.89 2.04 2.78 2.00 2.70 1.96 2.62 1.91 2.53 1.88 2.48 1.84 2.41 1.82 2.37 1.79 2.32 1.77 2.28 1.76 2.26 2.18 3.03 2.13 2.93 2.09 2.85 2.02 2.74 1.98 2.66 1.94 2.58 1.89 2.49 1.86 2.44 1.82 2.36 1.80 2.33 1.76 2.27 1.74 2.23 1.73 2.21 2.16 2.99 2.11 2.89 2.06 2.81 2.00 2.70 1.96 2.62 1.92 2.54 1.87 2.45 1.84 2.40 1.80 2.32 1.77 2.29 1.74 2.23 1.72 2.19 1.71 2.17 2.15 2.96 2.10 2.86 2.05 2.77 1.99 2.66 1.95 2.58 1.90 2.50 1.85 2.41 1.82 2.36 1.78 2.28 1.76 2.25 1.72 2.19 1.70 2.15 1.69 2.13 2.13 2.93 2.08 2.83 2.03 2.74 1.97 2.63 1.93 2.55 1.88 2.47 1.84 2.38 1.80 2.33 1.76 2.25 1.74 2.21 1.71 2.16 1.68 2.12 1.67 2.10 2.12 2.90 2.06 2.80 2.02 2.71 1.96 2.60 1.91 2.52 1.87 2.44 1.81 2.35 1.78 2.30 1.75 2.22 1.72 2.18 1.69 2.13 1.67 2.09 1.65 2.06 2.10 2.87 2.05 2.77 2.00 2.68 1.94 2.57 1.90 2.49 1.85 2.41 1.80 2.32 1.77 2.27 1.73 2.19 1.71 2.15 1.68 2.10 1.65 2.06 1.64 2.03 2.09 2.84 2.04 2.74 1.99 2.66 1.93 2.55 1.89 2.47 1.84 2.38 1.79 2.29 1.76 2.24 1.72 2.16 1.69 2.13 1.66 2.07 1.64 2.03 1.62 2.01 2.07 2.80 2.02 2.70 1.97 2.62 1.91 2.51 1.86 2.42 1.82 2.34 1.76 2.25 1.74 2.20 169 2.12 1.67 2.08 1.64 2.02 1.61 1.98 1.59 1.96 2.05 2.76 2.00 2.66 1.95 2.58 1.89 2.47 1.84 2.38 1.80 2.30 1.74 2.21 1.71 2.15 1.67 2.08 1.64 2.04 1.61 1.98 1.59 1.94 1.57 1.91 2.03 2.72 1.89 2.62 1.93 2.54 1.87 2.43 1.82 2.35 1.78 2.26 1.72 2.17 1.69 2.12 1.65 2.04 1.62 2.00 1.59 1.94 1.56 1.90 1.55 1.87 2.02 2.69 1.96 2.59 1.92 2.51 1.85 2.40 1.80 2.32 1.76 2.22 1.71 2.14 1.67 2.08 1.63 2.00 1.60 1.97 1.57 1.90 1.54 1.86 1.53 1.84 2.00 2.66 1.95 2.56 1.90 2.49 1.84 2.37 1.79 2.29 1.74 2.20 1.69 2.11 1.66 2.05 1.61 1.97 1.59 1.94 1.55 1.88 1.53 1.84 1.51 1.81 1.99 2.64 1.94 2.54 1.89 2.46 1.82 2.35 1.78 2.26 1.73 2.17 1.68 2.08 1.64 2.02 1.60 1.94 1.57 1.91 1.54 1.85 1.51 1.80 1.49 1.78 1.98 2.62 1.92 2.52 1.88 2.44 1.81 2.32 1.76 2.24 1.72 2.15 1.66 2.06 1.63 2.00 1.58 1.92 1.56 1.88 1.52 1.82 1.50 1.78 1.48 1.75 240 TABLES Table D (continued) Degrees of freedom for Degrees of freedom for greater mean square lesser mean square 1 2 3 4 5 6 7 8 9 10 11 46 4.05 7.21 3.20 5.10 2.81 4.24 2.57 3.76 2.42 3.44 2.30 3.22 2.22 3.05 2.14 2.92 2.09 2.82 2.04 2.73 2.00 2.66 48 4.04 7.19 3.19 5.08 2.80 4.22 2.56 3.74 2.41 3.42 2.30 3.20 2.21 3.04 2.14 2.90 2.08 2.80 2.03 2.71 1.99 2.64 50 4.03 7.17 3.18 5.06 2.79 4.20 2.56 3.72 2.40 3.41 2.29 3.18 2.20 3.02 2.13 2.88 2.07 2.78 2.02 2.70 1.98 2.62 55 4.02 7.12 3.17 5.01 2.78 4.16 2.54 3.68 2.38 3.37 2.27 3.15 2.18 2.98 2.11 2.85 2.05 2.75 2.00 2.66 1.97 2.59 60 4.00 7.08 3.15 4.98 2.76 4.13 2.52 3.65 2.37 3.34 2.25 3.12 2.17 2.95 2.10 2.82 2.04 2.72 1.99 2.63 1.95 2.56 65 3.99 7.04 3.14 4.95 2.75 4.10 2.51 3.62 2.36 3.31 2.24 3.09 2.15 2.93 2.08 2.79 2.02 2.70 1.98 2.61 1.94 2.54 70 3.98 7.01 3.13 4.92 2.74 4.08 2.50 3.60 2.35 3.29 2.23 3.07 2.14 2.91 2.07 2.77 2.01 2.67 1.97 2.59 1.93 2.51 80 3.96 6.96 3.11 4.88 2.72 4.04 2.48 3.56 2.33 3.25 2.21 3.04 2.12 2.87 2.05 2.74 1.99 2.64 1.95 2.55 1.91 2.48 100 3.94 6.90 3.09 4.82 2.70 3.98 2.46 3.51 2.30 3.20 2.19 2.99 2.10 2.82 2.03 2.69 1.97 2.59 1.92 2.51 1.88 2.43 125 3.92 6.84 3.07 4.78 2.68 3.94 2.44 3.47 2.29 3.17 2.17 2.95 2.08 2.79 2.01 2.65 1.95 2.56 1.90 2.47 1.86 2.40 150 3.91 6.81 3.06 4.75 2.67 3.91 2.43 3.44 2.27 3.14 2.16 2.92 2.07 2.76 2.00 2.62 1.94 2.53 1.89 2.44 1.85 2.37 200 3.89 6.76 3.04 4.71 2.65 3.88 2.41 3.41 2.26 3.11 2.14 2.90 2.05 2.73 1.98 2.60 1.92 2.50 1.87 2.41 1.83 2.34 400 3.86 6.70 3.02 4.66 2.62 3.83 2.39 3.36 2.23 3.06 2.12 2.85 2.03 2.69 1.96 2.55 1.90 2.46 1.85 2.37 1.81 2.29 1000 3.85 6.66 3.00 4.62 2.61 3.80 2.38 3.34 2.22 3.04 2.10 2.82 2.02 2.66 1.95 2.53 1.89 2.43 1.84 2.34 1.80 2.26 00 3.84 6.64 2.99 4.60 2.60 3.78 2.37 3.32 2.21 3.02 2.09 2.80 2.01 2.64 1.94 2.51 1.88 2.41 1.83 2.32 1.79 2.24 TABLES 241 Deg rees of freedom for greater mean square 12 14 16 20 24 30 40 50 75 100 200 500 00 1.97 2.60 1.91 2.50 1.87 2.42 1.80 2.30 1.75 2.22 1.71 2.13 1.65 2.04 1.62 1.98 1.57 1.90 1.54 1.86 1.51 1.80 1.48 1.76 1.46 1.72 1.96 2.58 1.90 2.48 1.86 2.40 1.79 2.28 1.74 2.20 1.70 2.11 1.64 2.02 1.61 1.96 1.56 1.88 1.53 1.84 1.50 1.78 1.47 1.73 1.45 1.70 1.95 2.56 1.90 2.46 1.85 2.39 1.78 2.26 1.74 2.18 1.69 2.10 1.63 2.00 1.60 1.94 1.55 1.86 1.52 1.82 1.48 1.76 1.46 1.71 1.44 1.68 1.93 2.53 1.88 2.43 1.83 2.35 1.76 2.23 1.72 2.15 1.67 2.06 1.61 1.96 1.58 1.90 1.52 1.82 1.50 1.78 1.46 1.71 1.43 1.66 1.41 1.64 1.92 2.50 1.86 2.40 1.81 2.32 1.75 2.20 1.70 2.12 1.65 2.03 1.59 1.93 1.56 1.87 1.50 1.79 1.48 1.74 1.44 1.68 1.41 1.63 1.39 1.60 1.90 2.47 1.85 2.37 1.80 2.30 1.73 2.18 1.68 2.09 1.63 2.00 1.57 1.90 1.54 1.84 1.49 1.76 1.46 1.71 1.42 1.64 1.39 1.60 1.37 1.56 1.89 2.45 1.84 2.35 1.79 2.28 1.72 2.15 1.67 2.07 1.62 1.98 1.56 1.88 1.53 1.82 1.47 1.74 1.45 1.69 1.40 1.62 1.37 1.56 1.35 1.53 1.88 2.41 1.82 2.32 1.77 2.24 1.70 2.11 1.65 2.03 1.60 1.94 1.54 1.84 1.51 1.78 1.45 1.70 1.42 1.65 1.38 1.57 1.35 1.52 1.32 1.49 1.85 2.36 1.79 2.26 1.75 2.19 1.68 2.06 1.63 1.98 1.57 1.89 1.51 1.79 1.48 1.73 1.42 1.64 1.39 1.59 1.34 1.51 1.30 1.46 1.28 1.43 1.83 2.33 1.77 2.23 1.72 2.15 1.65 2.03 1.60 1.94 1.55 1.85 1.49 1.75 1.45 1.68 1.39 1.59 1.36 1.54 1.31 1.46 1.27 1.40 1.25 1.37 1.82 2.30 1.76 2.20 1.71 2.12 1.64 2.00 1.59 1.91 1.54 1.83 1.47 1.72 1.44 1.66 1.37 1.56 1.34 1.51 1.29 1.43 1.25 1.37 1.22 1.33 1.80 2.28 1.74 2.17 1.69 2.09 1.62 1.97 1.57 1.88 1.52 1.79 1.45 1.69 1.42 1.62 1.35 1.53 1.32 1.48 1.26 1.39 1.22 1.33 1.19 1.28 1.78 2.23 1.72 2.12 1.67 2.04 1.60 1.92 1.54 1.84 1.49 1.74 1.42 1.64 1.38 1.57 1.32 1.47 1.28 1.42 1.22 1.32 1.16 1.24 1.13 1.19 1.76 2.20 1.70 2.09 1.65 2.01 1.58 1.89 1.53 1.81 1.47 1.71 1.41 1.61 1.36 1.54 1.30 1.44 1.26 1.38 1.19 1.28 1.13 1.19 1.08 1.11 1.75 2.18 1.69 2.07 1.64 1.99 1.57 1.87 1.52 1.79 1.46 1.69 1.40 1.59 1.35 1.52 1.28 1.41 1.24 1.36 1.17 1.25 1.11 1.15 1.00 1.00 INDEX Accuracy (precision), 4, 6f, 29, 31 vs. precision, 1, 6, 31, 142 Analysis of variance; see F- distribution Approximations (methods), 13, 15f, 20, 69f, 76, lOlf ; see also Binomial expansion, Taylor's expansion Average (arithmetic), 8, 43, 45, 48, 51, 56, 60, 65, 73f, 84, 94, 97f, 107f, 111, 113, 127f, 156, 158, 162f, 168, 172f, 177, 182f, 189, 193f, 201f Average deviation, 77 Bar graph; see Histogram Binomial distribution, 42f, 50, 56, 64f, 72, 74f, 77f, 144f, 149f, 158 Binomial expansion, 13, 17, 20, 41, 46, 145 Chauvenet's criterion; see Rejection of observations Chi-square distribution and testing, 157f, 172, 200f Combinations (unordered groups), 39f, 49f, 149 Computers, use of, 3f, lOf, 24, 90f Condition equations, 98f, 114f Consistency of results, 127f, 132f, 138f, 141, 144, 197; see also Chi-square, ^-distribution, Range, Rejection, Student's t distribution Correlation coefficient, 188f sample, 193, 204 Curve fitting, 23f, 29f ; see also Mathematical functions black thread method of, 24f , 29f Curve plotting, 21f, 48 residual, 26, 29 rules for, 22, 25f Degrees of freedom, 106, 115, 163f, 166f, 171, 174, 178f, 181f, 185f, 196, 199, 201f Desk calculators, use of, 4, 11, 24, 89f, 94, 113 Determinants, 14, 91, 100, 124, 126, 161f, 219f ; see also Variable, changes of in integration Distribution, 42, 51f; see also Frequency distribution, Multinomial distribution, Probability, Universe asymmetric, 45, 47, 52, 55f, 70, 78, 111, 147, 178f; see also Binomial, Chi-square, F-, Poisson, and Range distributions most probable, 61f, 83 noncontinuous ; see Binomial distribution, Poisson distribution symmetric, 45, 52, 55, 60f, 63, 65f, 77f ; see also Binomial, Normal, Rectangular, and Student's t distributions 243 244 INDEX Distribution function, 59f, 152f, 155, 164, 200 one-dimensional, 143f, 189 two-dimensional, 144, 188f Error bars, 135, 138, 141 Errors, 31f ; see also Distribution, Distribution function, Fractional errors, Mistakes, Propagation of errors, True value chance (random), 3, 6, 34, 40, 43, 51f, 82, 85, 135, 143f, 180f maximum, 57, 118 systematic, 3f, 6, 127, 130f, 134f, 142, 177, 181 instrumental, 31f, 127 personal, 3 If theoretical, 31f, 141 Expectation value, 43, 61, 65, 68f, 74, 105, 108 for binomial distribution, 43f, 56, 72, 145f, 170, 200 for Chi-square distribution, 200 for normal distribution, 66, 72, 80 96, 105, 108, 127, 151, 158f, 168f, 172f, 181, 188f for Poisson distribution, 45f, 56, 146f for range distribution, 111, 150f Experiments, illustrative, Behr free fall, 84f, 91, 94f, 141 dart dropping, 54f, 58, 60, 66, 80, 110, 168f, 201f focusing, 52f, 55 Kundt's tube, 94, 115 Newton's Law of Cooling, 23, 29, 102f Experiments, role of, 2 Factorial (Gamma) function, 41, 49f, 69, 149f, 164, 175, 178, 199, 201, 215f, 222 F-distribution and testing, 177f, 203 Fractional error, 6, 9f, 15, 123 Frequency distribution, 48 Functions ; see Mathematical functions Graphic analysis, 21f, 90, 101, 114; see also Curve fitting, Curve plotting Histogram, 53f, 58f, 66, 68, 72, 80, 110 Identical readings, 107f Independent observations, 123, 157f, 172f, 185 Laboratory work, 1, 4, 8, 32, 85 Least squares, linear problem, 88f, 118 method of, 82f, 118f, 193f, 198, 212f nonlinear problem, lOOf, 115f, 118 Limits of reliability, 79, 108f, 136 Line of regression, 189f, 193 Linear correlation; see Correlation coefficient Linear observation equations; see Straight line, Observation equations Mathematical functions of common occurrence, 28, 115, 140f; see also Polynomials, Straight line Mean; see Average, Expectation value Measure of precision, 65f, 73f, 95f, 104f, 111, 118, 123, 127, 129f; see also Average deviation, Chi-square distribution, F- distribution, Probable error, Standard deviation Mistakes, 3, 32, 53, 55, 108; see also Rejection of observations Most probable value (or event), 24, 43, 45, 48, 51, 56f, 61, 65, 67, 70f, 73, 83, 97, 99f, 102, 113f, 118, 138f, 147 Multinomial distribution, 148f, 152, 158, 200 Nonlinear observation equations, 27f, 115f; see also Polynomials INDEX 245 Normal (Gaussian) distribution, 21, 33, 52, 59f, 73, 75f, 80f, 83f, 96, 108, 111, 146f, 150f, 156, 158f, 162f, 168f, 171, 188f, 199f, 207f, 209 Normal equations, 87f, 91, 98f, 124f, 226 Normalization, 64, 154, 163f, 176, 192, 196, 207f Null hypothesis, 180f, 193, 197f, 202 Numbers, types of, 10, 12, 14f, 31; see also Rounding off, Significant figures Observation equations, 86f, 92f, 98f, 107, 118, 124f, 142; see also Identical readings Parameters, of observation equations, 27f, 67, 94, 143; see also Average, Observation equations of distributions, 78, 168, 190; see also Expectation value, Standard deviation Permutations (ordered groups), 39f, 49f Poisson distribution, 45f, 50f, 56, 64, 68f, 75, 77f, 146f Polynomials, 28, 88, 113f, 116, 137f, 140f Precision; see Accuracy, Measure of precision Probability, 34f, 73; see also Distribution; Distribution function; Experiments, illustrative, dart dropping analytical (method), 34f, 42f, 47, 49f, 65, 68, 74, 80f compound (joint), 50, 58, 61f, 7 If, 83, 96, 149, 159f, 166f, 172f, 177f, 190, 194 elementary (a priori), 34f, 43, 45, 47, 61, 145, 148, 158f, 170 experimental (method), 34f, 41f, 49, 56, 58, 68 Probable error, 21, 73, 76f, 130, 155 Propagation of error, 118f; see also Weights in products, 122 in sums, 122 Radioactive counting, 46f, 51, 146f, 200 Random variable; see Variable Range distribution and testing, 11 If, 116, 150f, 158, 163, 200, 214 Rectangular distribution, 75, 107, 200 Rejection of observations, 57, 108f, 116f, 135f, 138 Residuals, 21, 56, 61f, 67, 83, 86, 88, 102, 104f, 108f, 111, 113, 128, 130, 132f, 158f, 163, 168, 182f, 194 large, 53, 55, 81, 108f, 135f; see also Mistakes, Rejection of observations Rounding off, 7f, 12f ; see also Numbers, Significant figures Significant figures, lOf, 15f, 90, 137; see also Rounding off Simultaneous equations, 14, 89f, 93, 98, 223f ; see also Determinants Smooth curves, see Curve plotting Standard deviation, 74f, 96f, 104f, 143f for binomial distribution, 74, 145, 170, 172, 199 for chi-square distribution, 200 for normal distribution, 74, 97f, 106f, 113f, 118f, 151, 159f, 176, 188f, 201, 209 for Poisson distribution, 74, 146f for range distribution, 154f for rectangular distribution, 74 for parameters of observation equations, 106f, 112f, 116, 162 Straight line, 24f, 27f, 91f, 95, 113f, 116, 126, 132f, 136, 138, 140, 188f Student's t distribution, 172f, 196f, 201f, 204; see also Correlation coefficient, sample 246 INDEX Taylor's expansion, 18f for more than one variable, 19, 60, 101, 103 True value (or error), 7, 21, 56f, 61f, 67, 82, 104f, 119, 151, 160f, 172f, 188f, 193f ; see also Universe Universe (true or parent distribution), 57, 59f, 67, 82, 96, 105, 108, 119f, 127f, 156, 159, 172, 176, 180£ 188, 193; 203 Variable, changes of in integration, 153, 159f, 166f, 174f, 192, 195f, 215, 217f dependent, 22, 92, 143 independent, 22, 92, 126, 143 evenly spaced, 94f, 114f random, 144, 189 Variance, 177f, 203 Weights, of observations, 6, 91, 93, 95f, 100, 103f, 114f, 124f, 133, 137f, 183 of parameters of observation equations, 104f, 122, 124f, 129f, 162, 212f ABCDE69876 THE AUTHORS Emerson M. Pugh, Emeritus Professor and Principal Research Physicist at the Carnegie Institute of Technology, is also affili- ated with the Johns Hopkins University Applied Physics Labora- tory, the Aberdeen Proving Ground, and the U.S. Bureau of Mines. He received the B.S. degree from the Carnegie Institute of Technology, the M.S. degree from the University of Pitts- burgh, and the Ph.D. degree from the California Institute of Technology. Dr. Pugh's fields of special interest are the two HaJI effects of ferromagnetic materials, and the physics of high explosives. He is the coauthor of Principles of Electricity and Magnetism, also published by Addison-Wesley. George H. Winslow is an associate physicist In the Chemical Division at Argonne National Laboratory, He received the B.S., M.S., and D.Sc. degrees from the Carnegie Institute of Tech- nology, where he has also served as an instructor. His areas of research specialization are molecular beams, high tem- perature thermodynamics, and the defect structure of solid compounds. This text constitutes an introduction to the theory of errors and statistical analysis. Its purpose is to encourage a wider use in physical science reporting of precisely reportable methods of error analysis than is presently the case. The book starts on a relatively elementary level and builds to a complexity comparable to that of real laboratory prob- lems. Because of this, the book can be used as a text but retained as a reference. Careful analyses of data, principles and simple tools of analysis, and more sophisticated tools for complex situations are treated in logical succession. In the few cases where mathematics beyond the elementary calculus Is needed, It appears in the appendices. Practical examples that make the context useful and understandable are used throughout. ADDISON-WESLEY PUBLISHING COMPANY READING, MASSACHUSETTS ■ PALO ALTO ■ LONDON ■ DON MILLS, ONTARIO 1'RINTKD IN" U.S.A. 6011